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The National Center for Bilingual Research (NCBR) intends to 
develop a large corpus of the language of bilingual children. 
This report surveys the available computer programs which could 
potentially aid in the linguistic analysis of the NCBR corpus by 
automating a hciber of labor-intensive and time-consuming 
linguistic analyses. 

Two criteria guided the search f6:r applicable computer 
programs. The automation of linguistic analyses which form the 
basis of the child language research for monoliriguals were 
preferred over those analyses which are not typically used in 
child language research. The computer programs must be easily 
implemented oh the^ UCLA IBM 370/3033 computer. 

Eight computer programs which met at least one of the 
criteria were evaluated in terms of their potential usefulness to 
NCBR. it was determined that the Computer Assisted Language 
Analysis System (CALAS) was the most promising in - terms of 
capabilities and cost. A series of programs which could be used 
immediately were located at UCLA, however; these programs are 
limited to word frequency counts and concordance programs based 
on terminal strings. 
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Automatic Mttgttistic Analysis 



ii Introduction 

The analysis of linguistic data has proven to be a time- 
dbhsumihg labor-intensive effort. The purpose of this report is 
to examine a series of computer assisted alternatives which 
reduce the amount of time and effort required for linguistic data 
analysis, in particular, a set of recommehdatibhs are made with 
respect to the needs of the National Center for Bilingual 
Research, which is presently collecting a large corpus of child 
language from bilingual children. 



ebmputat ibhai Linguistics is a field that has been devoted 
to the automatization of linguistic information, whether it be 
for the machine translation of one language to another or for 
the analysis of textual and discourse information. Computational 
linguistics became very active in the late i950's with the advent 
of large computational machines. Since that time the field has 
developed in several directions, and has been supplemented by the 
hewer field of Artificial Intelligence. This report will present 
a brief review of the goals and accomplishments of these two 
fields, followed by 'a discussion of desirable linguistic 
analyses for the NCBR corpus. Finally, a series of computer 
programs which could potentially aid in the automatization of the 
desirable linguistic analyses will be evaluated in terms of their 
ease of implementation by NCBR. 

il. Computational Linguistics and Artificial intelligence 

Research in computat ibhal linguistics generally falls into 
one of three areas: machine translation of one language tb 
another, computer validation of linguistic theory, or 
computerized linguistic analysis bf text or discourse. Machine 
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trahsiatibh of one language to another was an area of research 
heavily funded in the late 195b's and early i960's. It was hoped 
that .computer programs would be able to automatically translate 
written documents or even intercepted audio signals. It is 
generally agreed that these early efforts failed .hot because the 
computers did not have sufficient computational power but because 
we simply did not have an adequate understanding of the structure 
of the rules of natural languages (Chomsky, 1957, 1965). The 
most unsettling discovery of these early attempts was that given 
i dictionery of the words of a language and the syntactic rules 
of that language, the computer still could not generate the 
ieaning of a sentence. What was missing was a set of rules which 
combines the meaning of individual words with the syntactic 
structure of a sentence to produce the meaning of that sentence. 
The delineation of these combination rules of semantic 
interpretation and the reassessment of the structure of syntactic 
rules have received considerable attention at the theoretical 
level during the last twenty years (Bresnan, 1976; Chomsky, 1965, 
1976; Jackendoff, 1972; Katz & Podor, 1964; £akoff, 1971; 
Montague, 1974; Partee, 1975, 1976). 

Today,, although there are still efforts being made in 
machine translation of one language to another (see discussion 
below), a large part of the field of computational linguistics is 
devoted to the testing contemporary advances of linguistic 
theory. That is, a given formalism in linguistic theory is to be 
preferred if the correct meaning or the correct syntactic parse 
of a sentence can be assigned by computer. Simultaneous with 
this effort has been the emergence of the field of Artificial 
Intelligence which seeks to have the computer understand not only 
natural language but also solve complex problems. The gBal of 
several projects has been the development of a computer program 
able to understand a sentence, to make an inference based on the 
meaning of that sentence, and then to use that inference as the 
partial solution to a given problem. Because so much of the 
effort in Artificial Intelligence involves the understanding of 
linguistic information, the Computational linguist and the 
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researcher in Artificial Intelligence have many shared goals. 

A number of computer programs designed to parse the 
syntactic structure of a sentence have been written to test 
competing linguistic theories of syntactic structure. Mitchell 
Marcus, who is currently at Bell Baborator ies, has written a 
determinist ic syntactic parser which incorporates a number of 
constraints oh linguistic rules propiosed by Chomsky (Chomsky, 
1975, 1976; Marcus, 1978). Ron Kaplan of Xerox Palo Alto 
Research Center is currently implementing a syntactic parser 
based on his previously developed Augmented Transition Network 
(ATN) parser and on Joan Bresnan's Realistic Grammar (Bresnan, 
1978), which is a competing theory to Chomsky's. Martin Kay also 
of Xerox is currently implemehtihg another parser based on 

Systemic Grammar. These parsers are similar because each was 
developed to test a theory, and, as such, none are comprehensive 
parsers of English. They consist only of a subset of the rules 
of English, and thus are not generally applicable to the task of 
analyzing a large corpus of naturalistic data. 

in Artificial Intelligence, more ambitious researchers have 
produced computer programs which not only assign a syntactic 
structure to the sentence, but also interpret the meaning of a 
sentence. The interpreted meaning, along with other stored 
knowledge, is processed to yield inferences which aid in complex 
problem solving. For example, Winograd's SHRDl/H conversed with a 
human in English about a small imaginary world of blocks. The 
conversation involved the computer responding to orders to move 
the blocks and keeping track of the relative positions of the 
blocks (Winograd, 1971,1972). SKRDITJ both interpreted and 
produced English sentences. iUNAR, developed at Bolt, Beranek & 
Newman, Inc. is used by NASA to access and manipulate moon rock 
samples data. Again the conversation with LUNAR is in English. 
SOPHIE (Sophit icated Instructional Environment) is capable of 
conversing in English with a student about the student's ideas oh 
electronic troubleshooting (Bobrow & Brown, 1975, Brown, Bell & 
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Burton* 1974, Brown, Burton & Bell, 1975). GHS (Genial 
Understanding Systme) communicated with travel agency clients' who 
wished to travel to a single city on any of several air flights. 
These and other projects by Anderson and Bower (1973), Schanlc 
(1973, 1976, 1978, 1980), and Norman and Rumelhart (1 975) are 
serious attempts to automate the understanding of linguistic 
information. They are, however, only attempts at what is 
possible. Typically, both the topics of conversation and the 
linguistic structures are restricted to those necessary for the 
tiny artificial domain of the system's "world". There has been 
ho attempt to develop a comprehensive set of linguistic rules, 
and lexicons have been limited to include only a small, 
interrelated set of words; and, as such, the computer programs 
are not .equipped to handle extensive semantic domains found in 
any spontaneous language corpus. 

The third area of Gomputational Linguistics is the 
linguistic analysis of textual and discourse information. 
Computers aid in the analysis of literature and poetry. For 
example, choice of words by two or more authors can be compared 
by computer concordance programs which count the number of times 
a particular word or phrase appears and listing out the context 
of'each instance of the word (Ross, 1972; Widman, 1975)- In this 
way the choice and use of words of particular authors can be 
compared and analyzed. Concordance programs vary as to which 
linguistic features they can analyze, and have been used to count 
the number of occurrences of syntactic structures (Chrisholm, 
1976) as well as to compute letter and word frequency, s,peiling 
patterns, and morphological complexity JSpolsky, Holm, Holliday & 
Efflbry, 1978). In addition to the linguistic analysis of 
literature, there are also programs which analyze scientific 
textual data. Por example, the String Parser programs at New 
Yort University analyze medical texts and other scientific 
textual information (Pitzpatrick & Sager, 1974; Hobbs & Grishman^ 
1976; Sager, 1976). The input in each of these eases is well- 
formed grammatical sentences of English, and the syntactical 
rules in these ^programs assume grammatically correct input 



sentences. 



Better suited to the linguistic analysis of the NCBR corpus 
are the programs which . analyze discburse. Computer programs 
have been designed to analyze interactive dialogue sessions 
between two or more people (Miron, 1973)- Dialogues between 
teachers and students, therapist and patient (Wachal S Spreen^ 
1970; Colby, Pi^rkinson & Fought, 1974), as well as schizophrenic 
arid other pathological lariguage (Pepirisky^ 1978) have been 
analyzed by computer programs. The advantage of these programs 
is that they can analyze sentence fragments, one word utterances 
and discourse-specific features not found in written language. 

ill. Analysis of Child Language Corpora 

The National Center for Bilingtaal Research intends to 
tape record the language of young bilingual children in a 
three year longitudinal study. The tapes will be transcribed and 
entered into the computer by clerical personnel. The accuracy of 
the transciptlons will be verified by personnel with linguistic 
training. Because the resulting corpus will be quite large, it 
is desirable to automate as much of the linguistic analysis as 
possible. But - before considering the actual pirbgrams which might 
be used to automate certain types -of linguistic analyses, a 
discussion of the particular analyses relevant to child language 
production data is in order. 

Since the transcripts will not contain a phonetic 
trarisciption of the child's speech, phonological analysis of the 
corpus is not possible. However, the syntactic, semantic and 
conceptual information in the corpus offers a rich base of data 
from which to analyze the complexity of the child's linguistic 
and conceptual development at particular ages. In order to 
evaluate the complexity of the bilingual . child's language, it is 
desirable to use at least some of the measures of linguistic 
complexity developed for the analysis of monolingual language 
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One of the most widely used measures of linguistic 
complexity has been the mean length utterance (HLU) in the 
child's spontaneous speech. It is the best single indicator of 
complexity up to about five morphemes per utterance (Brown, 
1973)- It indicates both syntactic and semantic complexity 
which is highly, correlated with conceptual complexity. It would 
be highly desirable to compute HLU for the NCBR corpus, as it 
would provide the basis for comparison with the extensive child 
language literature on monolinguals. 

Slobin (1973) has developed a number of indices as to what 
contributes to syntactic complexity. These are based on the 
following language acquisition uhiversals (taken from Slobin, 
1979). 

1) For any given semantic notion, grammatical realizations 
as postposed forms will be acquired earlier than 
realizations as preposed forms. 

2) The following stages of linguistic marking are. typically 
observed: (1 ) no marking, (2) appropriate marking, (3) 
ov^rgeneralization of marking, (4) full adult system. 

3) The closer a grammatical system adheres to one-to-one 
mapping between semantic elements and surface elements, 
the earlier it will be acquired. 

4) When selection of an appropriate inflection among a 
group of inflections performing the same semantic 
function is determined by arbitrary formal criteria, the 

' child initially tends to use a single form in all 
environments. 

5) Semahtically consistent grammatical rules are acquired 
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early arid without significant errors. 

Using- Slobin's "uniVersais of language acquisit ion, it is possible 
to predict which syntactic structures will be difficult to learn 
i'n any language. For example, syntactic rules which are 
inconsistently applied or which attach themselves to the 
beginnings of words rather than to the ends of words are 
considered as complex relative to rules which are consistent 
with Slobin's universals. 

Consider the tense system of English. Semant ically, English 
expresses three tenses: past, present, and future. 
Syntactically, however there are only two tense markers: past and 
present. Each semantic expression can be syntactically marked as 
either past or present, as the fdlldwing examples indicate. The 
examples are taken from Gulicover (1975). All three are 
syntactically marked in the present tense, though each one 
semaiit'ically represents a different tine. ^ 

1) i come home and then John says to me "Where 

the devil have you been all day?" (semantic past) 

2) i choose Mary, (semantic present) 

5) I sail for England next Wednesday, (semantic future) 

This system becomes very complicated for the child when he (or 
she) learns the past tense marker and it does not always refer to 
some time in the past as in 

4) i waHlji like a glass of milk. (semantic present, would 
is marked syntactically past) 

These examples illustrate Sldbin's third universal, that when 
there is not a one to one mapping between semantic elements and 
surface syntactic markings, the language learning task becomes 
more difficult. 
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in order to make specific comparisons .with regard to the 
syntaatic complexity of the child's speech, the level of analysis 
must be quite detailed. Pbr example, Brown and others _(Brown^ 
1973, Brown & Bellugi, 1964; Brown, Cazden & Bellugi, 1969) have 
traced the development of 14 grammatical morphemes in English. 
Sone of these are: present progressive (-ing) the prepositions 
onn and in, plural, possessive ('s), uncontracted copula (is), 
articles (the and a), irregular and regular past tense. To 
automate this type of ^syntactic analysis, the computer program 
must be able to detect individual Sorphemes when they appear as 
parts of words. 

ether syntactic analyses which are important in determining 
the syntactic complexity of the child's language include analysis 
at the phrasal level. For example, the syntactic structure of 
(5) is generally regarded as more complex than that of {6). This 
is because (5) includes an embedded sentence in the subject noun 
phrase of the sentence whereas (6) does not have this additional 
structure at the surface level of analysis. 

(5) The dog which belonged to Mary died. 

(6) Mary's dog died. 

Thus, it would be very useful to be able to analyze the child's 
utterances according to their phrasal complexity. This involves 
first determining What part of speech each Word in the sentence 
is and then determining which syntactic rule applies to the 
sequence of syntactic categories. In order to perform this type 
of analysis on the computer, it is necessary to have a lexicon of 
the common words coded as to their syntactic category. 
However, this is sometimes difficult to implement since part of 
speech determination is often dependent on the placement of the 
word in the phrase or sentence. So, if a lexicon with associated 
syntactic categories is to be maintained, we' must allow for the 
occurrence of iore than one syntactic category. for a particular 
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wbra. This introduces ambiguity into the analysis, K^hich must be 
be resolved at some later stage of analysis. 

The child's mastery of coordinate and subordinate structures 
must also be analyzed by a progriam with phrasal/sentential level 
capabilities. This is somewhat easier Vo determine 
automatically, since the program can search for coordinating and 
subordinating conjunctions which introduce these clausal 
structures. Although there is ambiguity as to the syntactic 
category of these conjunct ions, it fairly easy to resOlVe the 
ambiguity, via the surrounding syntactic structure of the 
sentence, which can be readily expressed in simple • phrase 
structure rules. Concordance programs could search for all the 
instances of the coordinating conjunctions, and, or, and then, 
but first, and the subordinating conjunctions, because, although. 
When, while, before, aftir, until, since. The "hits" of the 
search then could be categorized as to whether the conjunctions 
conjoined sentences or phrases. 

The use of subordinating conjunctions not only indicates a 
syntactic sophistication but also the mastery Of difficult 
semantic concepts. These in addition to logical connectors such 
as if.. .then, either. ..or, and suppose indicate advanced semantic 
development. The line between syntactic development and semantic 
development is also blurred when we consider the development of 
complex verbs, such as believe, understand, volunteer, realize, 
imagine, etc. which take sentential or infinitival complements. 

in sum, there are a variety of linguistic analyses which 
measure the syntactic/semantic and conceptual coipiexity of child 
language. Many of these measures require detailed linguistic 
analysis. To perform these analyses automatically required a 
sophisticated computer program. 
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ITi Criteria for Evaluating Automatic Linguistic Analysis 
Programs 

Two overriding criteria served as the basis for the 
evaluation of computer progratms for the automatic linguistic 
analysis of NCBr'' corpus. The first was to seek computer programs 
which automated as much of the linguistic analysis as possible^ 
That is^ programs which could analyze the phrase striactixre of a 
sentence were considered more desirable than simple concordance 
programs which compute frequencies at the terminal string level 
only. The second and more important consideration was the amount 
of effort and time required to implement the computer program on 
the IBM 370/3033 at UCLA. From these two general considerations, 
the following list of questions was generated. 

1) is the progran designed to analyze spontaneoixs discourse 
or textual information? The problem here is that if the 
program is designed with the assumption that each 
sentence will be a grammatical sentence of English, then 
a considerable amount of effort must be spent in writing 
a new set of syntactic surface structure rules which 
will allow for sentence fragments and one word 
utterances typical of spontaneous discourse. 
Additionally, since the grammatical rules of child 
language differ from adult grammatical rules, provisions 
must be made in the program for the addition of the 
rules of child grammar. 

2) What is the structure of the lexicon in the program, and 
how much effort is required to add hew words to it? In 
particular, what attributes are associated with each 
word? (e.g. inflectional morphenes, syntactic 
categorization rules)i 

3) What is the output of the program? Does it count the 
number of occurrences of a particular structure? Does 
It keep track of where in the corpus the structure of 



interest bccuffed? Is it possible to obtain a listing 
of the surrbuhdihg context of the\ structure in question? 
Is the type of output under user ' control? 

4) How transportable is the program to the UCIA IBM 
570/5055^ 

♦ Is there a programmer who is currently assigned to 
maintain the code? 

♦ What is the current amount of usage of the program? 

♦ What machine does the program run on? Are there any 
machine-dependent ut ilit ies required for the 
implemehtat ion of the program? ^ 

♦ What operating system does the program run under? 

♦ What programming language is the code written in? 

5) Can the program be used via remote timesharing? 

6) Sow much main memory does the program require? 

7) How costly is it to use the program? 

♦ How long does the program take to analyze a 10 word 

sentence? 

8) What is the relationship between the £.ize of the lexicon 
and the amount of disk storage? 

g) What documentation is available? 

♦ Are there user manuals? 

♦ Are there software maintenance manuals? 

•U 



♦ is there operations documentation? 
Y; Surveyed iiinguiatic Analyeie Computer Programs 

As discussed in the introduction, the computer programs 
which purported to analyze textual and discourse information were 
deemed the most appropriate for the purposes of analyzing the 
NCBR corpus. This is because these programs attempted to be 
comprehensive in the development of their syntactic parsing rules 
and their lexical entries. Additionally, we discovered two 
machine translation programs which are very sophisticated despite 
a reduction in government funding for machine translation 
'projects. We begin with the two machine translation programs, 
both of which axe capable of translations between English and 
Spanish. 

T.A. Brigham Young Hhiversity Project 

The theoretical basis for this machine assisted translatioii 
project is. Junction Grammar developed by Eldon tytle (Lytle^ 
Packard, Gibb, Melby k Billings, 1975)- Junction Grammar 
representations consist of word-sense information interrelated by 
junctions which contribute syntactic and semantic information, 
in the first stage of the translation system, the program 
interacts with a human operator who aids the machine in resolving 
ambiguities, producing a representation of the meaning of the 
text. The second and third stages of the translation process 
are automatic transfer and synthesis into one or more target 
languages. 

Currently, there are two versions of the Junction Grammar 
machine translation system. The first is still at Brigham Young 
University. It is a highly interactive system, which requires a 
linguist who is conversant in Junction Grammar to properly 
resolve the ambiguities which the machine presents to the human 
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operator. it is capable of sophisticated linguistic parseSj e.g. 
it cari note the difference between restrictive and noh- 
restribtive relative clauses; and can distinguish count versus 
mass nouns, generic versus specific senses, among others. 
Uhfof turiately, at the present time, the Brigham Young University 
project is under experimental revision* and ^the code is not 
transportable. When the code is intact, it runs ' on an IBK 
370/130 and is written in PI1 . Time-sharing is available. 

The other version of the Junction Srammar project is a 
commerj^ially available machine assisted translation program. 
This version was developed by Eldbh lytle and others and is 
available from API Systems, 450 N. University, Provo, Utah, 
84601. This version has eliminated the need for a trained 
linguist to,^ resolve the aibiguities. The system is highly 
interactive ihd .As capable of translating English text into 
Spanish, French and German. The lexicon is cjuite extensive with , 
5000 general purpose words, and specific lexicons in computer 
science, heavy equipment, . and . systems design. Dr. iytle 
indicated that it is fairly easy to add more words to the lexicon 
and that it is suited to the analysis of dialogue as well as 
textual information- Also, it would not be difficult to add child 
language grammar to the other syntactic parsing rule^^. There are 
two drawbacks as far as using this system for the NeSR corpus. 
First, it runs oh a Data Seheral machine and is written in ALGOL. 
It would be an extensive project (as much as one man yeai) to 
convert the code to run on the UCLA IBM machine. ALP Systems 
expects to have their programs converted to run on other 
machines, though to date no specific plans Have been for an IBM 
conversion. Secondly, because, it is a commercial product, 

NCBR would have to purchase the program^ which is fairly 
expensive due to the long development effort by the company. 



V.B University of Texas, Austin, Linguistics Researcb Center 

The/ Linguistics Research Center has developed an i3nglish- 



German translation program. It can taki a. Sentence as input and 
generate the syntactic structure of the sentence, eui-rehtljsr it 
has a lexicon, of 3,000 words, with specialized lexicons in 
teleccmmunications and electronic switching systems, and in 
computer systems. There are several drawbacks as to using this 
system for the NCBR corpus. Pirst, a highly trained linguist 
would have to write the child language grammar to input into the 
system.' Linguists trained in theoretical linguistics typically 
have not had the experience in writing the computationally 
unambiguous syntactic rules necessary for machine translation^ 
Second, the funding of the Texas project is currently being taken 
over by private sources and thus all future versions of this 
project will either not be available or will be at commercial 
prices. Thirdi though the programs are highly portable because 
they are written in HGI LISP, a relatively machine-independent 
high level programming language, a conversion effort is still 
reciuired to run under the IBM operating system. The present 
implementation at Texas is on a DEC 10 but the Texas system is 
currently being converted to INTERLISP which will run on the DSC 
20. In sun, though the Texas project is well-developed, the 
change in their funding situation means that the currently 
available system will fall into disuse, with the task of 
software maintenance becoming the burden of NCBR. 

The final report of the Texas translation project may be 

obtained after October 1, 1980 from Zbigniew L. Pankowicz, 

Foreign Technology Division, Rome Air Development Center, 

C-riffiss ATB, m 13441. ■' 

y.e. Syracuse University 

In the late 1960's and early I970's Professor Hurray Miron 
directed a number of projects which consisted of computer 
programs to perform frequency analysis of vocabulary and sentence 
patterns in Japanese, Swahili and English (MiroH, 1973; Rubama, 
Miron & Pratt, 1973; Sukle, Miron S Pratt, 1973). While those 
programs are capable of relevant linguistic analyses, the 
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programs have not be used in the last * five years arid thus it is 
extremely unlikely that are transportable to UCLA. Professor 
Hirbh currently has linguistic analyzer called General inquirer 
ii which was developed for use in analyzing dialogue. Professor 
Miron said that General Inquirer ii would be ideal for the 
analysis of the NCBR corpus. That is, it is possible to add more 
syntactic rules to the parser and more words to the lexicon. 
Also it is capable of generating the type.s ^f output of interest 
to NCBR, e.g. frequency counts of parts of speech, phrasal and 
•sentential structure, etc. General Inquirer II is currently 
being used to aid the FBI in analyzing threats. Professor Mirbn 
uses it to develop personality profiles. Professor Miron was 
very interested in developing a collaborative effort with NCBR 
with respect to the use and maintenance of General inquirer ii. 
As with many computer programs which are developed with 
Government funding on a project basis, not enough resources are' 
allowed for documentation and software maintenance. Professor 
Miron estimated that if NCBR wanted to use the program at 
Syracuse University, it would take one man year of programming 
effort to make the modifications for child language analysis. 
Furthermore, to transfer the program to UCLA would be next to 
impossible as the code is a potpouri of different programming 
languages, with no overall design. There is no documentation. 
Finally, to run the program it tsies a large amount of random 
access memory (RAM) which is expensive. 

V.D. Hew tdrk University, Lingaistic String Parser Project 

The iinguistic String Parser developed at NYU is designed 
for the analysis of scientific texts (Fitzpatrick & Sager, 1974). 
The parser takes well-formed complete sentences of English and 
outputs a parse tree for the sentence. Although it would accept 
a noun phrase without a verb or an object, in general it is 
unacceptable for discourse data. Another drawback is that it is 
a non-interactive system, and at the present time there are ho 
provisions for outputs other than parse trees. The Linguistic 
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string Parser has a large set of syntactical rulis as well as an 
extensive lexicon. The iixicon stores a variety of attributefs of 
the word, including morphological variants^ , grammatical 
categories, selectional restrictions, and subcategor izat ion 
rules. Currently the program is running on a CDC 6600 and uses a 
large amount of RAH memory (600KB). Though it is written in 
FORTRAN, it would still need to be converted to the IBM operating 
iystem. It is also extremely costly; a ten word sentence takes 1 
second of CPU time to parse. 

Y.E. IBM Projects 

Currently there are two projects of interest at the IBM 
Thomas J. Watson Research Center. The first one is called TQA 
for transformational question and answering program. It is 
designed to be the natural language interface tc a data base 
managefflent system (DBMS). Thus it understands and produces 
English discourse. Presently, it is being used as. an interface 
to a municipal data base on land use assessments. Though it is 
capable of extensive syntactic and semantic analysis, this 
program is proprietary to IBM is thus not available for 
dissemination. 

The second project at IBM is syntactic parser based on 
Controlled Partition Grammar (Muckstein, 1979). This parser 
takes the output from a speech recognition system, operating 
bottom-up to generate a written version of the text. The 
syntactic parser is constructed to recognize and define surface 
syntactic dependencies based on the parts of speech which have 
been generated by a part^of-speech label algorithm. This parser 
has been used to analyze the text of depositions of patent 
.attorneys. The sentences average 35 words in length and tend to 
be well-formed grammatically. Dr. Muckstein indicated that it 
would take a considerable amount of effort to adapt the program 
to a child language corpus. Furthermore, since the research was 
supported by IBM and not by government funds, the computer 
programs are most likely proprietary to IBM and hence not 

avniiable. ''^ ^'J 
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Y.r. SRI Internatiohal^ DIASHAH 



SRI has developed a natural language understanding system 
called DIASRAM, which produces parse trees as its output. These 
parse trees are then semantically interpreted and prdduce the 
logical meaning of the sentence, Ihe logical meaning can then be 
queried by other computer systems. DIAGRAM currently has a 
lexicon of 5,000 v-ords in English and Spanish. The structure of 
lexical entries is detailed and complex. The verbs alone are 
categorized by some 20 attributes, such as whether they are 
transitive, intransitive, or detrahsit ive; whether they take 
particles, etc. in terms of modifying the syntactic rules and 
lexicon to accommodate child language grammar, a highly trained 
linguist would heed to spend some time with the project linguist. 
Dr. Jane Robinson, in order to learn the system of gr,ammaticai 
rules implemented by DIAGRAM. The development of the lexical 
entries is the most difficult task. Mr. Gary Hendricks of the 
project estimated that if SRI were , to add 500 new lexical items 
for NdBR and also gave NCBR a two week training session, the cost 
would be approximately $50,060. if NGBR were to do all of the 
linguistic work, then it would cost approximately $10,000 for 
training. Because DIAGRAM was developed under Government 
funding, the code is available at no- charge. 

To install DIAGRAM on the UGiA computer, it would require 
the conversion of the code, written in INTERiISP, to the IBM 
operating system. At SRI^ DIAGRAM runs on a DEC 10 and a 
Poonley which emulates the DEC 10. The operating systems it runs 
under are 1 0X and TOPS 20. Mr. Hendricks indicated that SRI 
would make timesharing available on their DEC 10 at the end of 
the year, and that timesharing costs for Government programs are 
inexpensive. In terms of the documentation available for 
DIAGRAM, there are two 20 page manuals for programmers and no 
user manuals. There are five users at SRI. 

DIAGRAM has received praise from the Stanford research 



cdmmuhity and so it dese^Ves careful consideration; Mr; 
Hendricks 5f SRI suggests NCBR send soie sampl.e data to SRI: and 
have them run it through DIAGRAM tc see if the resultant parse 
would te useful for NCBR's purposes. In terms of CPU time, a 
full parie with semantic interpretation takes approximately one 
second for a ten word sentence and a syntactic parse without a 
semantic interpretation takes a^out 25© msec Technical reports 
on DIAGRAM are available from Dr. Jane Robinson of SRI. She 
can be reached at (415) 325-62GG, extension 4575- 

y.G. Computer ABsisted language AnalyaiB System (GAMS) 

GALAS was developed to analyze discourse and dialogue 
information. It has been used to analyze interact ions . between 
students and teachers in a classroom setting and between 
therapist and patient in a clinical setting. GALAS consists of 
three stages. Stage 1, called EYEBALL, assigns the part of 
speech to each word in the sentence. Ambiguities of parts of 
speech are resolved by a human editor. Stage 2, PHRASER, assigns 
aggregates of words to phrase structures. Again a human editor 
eliminates possible ambiguities. Finally in Stage 5, GLAUSE/GASS 
assigns semantic roles according to Gase Grammar. All human 
editing can be done either interactively or offr-line. 

Because GALAS relies on human editing, the computer 
programs are not as complex and costly to run as some of the 
other programs, we have discussed ( DIAGRAM, Linguistic String 
Parser, and General Inquirer II ). The human editor need not 
be a linguist; a good working knowledge of freshman English is 
adequate. The editing process is the most important at Stage 1, 
as EYEBALL has an 85^ accuracy rate in assigning syntactic 
categories to the words. If the errors are caught in this stage, 
the remaining editing proceeds smoothly. Errors that escape the 
editor in the Ifirst Stage 1 can play havoc with the . next two 
stages. V.^ — 



GALAS is a flexibii program and can be easily modified to 
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analyze child language data. The program was designed for the 
analysis of discourse, and it is a simple matter to add new 
lexical items to the dictionary as well as change the 
syntactic/semantic rules. ' For examptle, the user is asked each 
time he or she logs onto the system whether lexical items are to 
he added or deleted and whether the syntactic/semantic rules are 
to be changed. This feature means that different child grammars 
can he tested for different aged children, (or different 
languages) in the corpus. This feature seems ideally suited to 
NCBR's needs. 

Another attractive feature of- the GALAS program is that 
print routines are designed to feed into SPSS ^ programs. For 
example, frequencies could be computed for: numberj of words per 
noun phrase, number of complex noun phrases, number of plural 
markers^ number of adjectives, nouns, etc., number of words per 
utterance. While this last item is not mean length utterance as 
used in the child language literature, most, if hot all, of the 
information used to calculate typical MLU counts jcan be taken 
from the GALAS program. \ 

\ 

in terms of transportability^ GAIAS will run on any IBM 
570 series including the IBM 370/3035 at UGLA. Dr. 'Naomi Meara 
at University of Tennessee recently installed CAIAS; on an i3M 
370/3031 with little difficulty. Most of the programs are 
written in PL1 and one program is written in SPITBQL, which is 
version of SNOBOI. Iri order to run GALAS, it is necessary to 
interface through another time sharing machine. The DSG PDP11/54 
should be sufficient for this purpose. Br. Mear'a and Dr. 
Pepinsky at Ohio State University are currently writing a user's 
manual. There are programmers at each institutioh_ who. have 
served as consultants on GALAS and would be willihgi to assist 
by phone or letter in the installation of GALAS at UjGLA. Both 
Dr. Pepinsky and Dr. Meara thought the installation would proceed 
smoothly. 
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GALAS can obtained from Dr.^ Pepinsky at Qhi6 State 

University simply by mailing him a tape or pending him $35 for a 
tape with the program en it. Dr. Pepinsky can be reached at 
(614) 422-5470. 

Y.H. UCLA Vord Preq.tiency and Concordance Programs 

If NCBR would like to begin some simple linguistic analyses 
immediately, we located a set of prbgrams which are available now 
and could be used With little programming resources or. the part 
of NCBR. The advantage of these programs is that they already 
run on the UCLA IBM 370/3033 computer, and they are used 
frequently enough to expect that they are well-maintained. The 
disadvantage is that they only perform word frequency counts and 
concordances of terminal strings specified by the user. But 
because they are relatively simple programs as compared to most 
of those reviewed, they also are inexpensive to run. The amount 
RAM memcry needed is dependent on the size of the corpus to be 
analyzed. The size of the NCBR corpus could be reduced by 
categorizing the corpus into meaningful subcategories, such as 
analyses by individual child, by a calendar period, by age of the 
children, by language, etc. 

In addition to Word frequency counts, the concordance 
programs can list the sentence in which each word of interest 
appears, as well as list the word in the middle of a page, along 
With the preceding and succeeding 60 characters^ on either side of 
the word. in this way, the context in which the word or phrase 
appears will be listed out for_,further analyses. These programs 
have a number of other useful features,, and we suggest that NCBR 
contact Dr. Rand in the SSL Department for further information; 
Dr. Rand has worked with SWRL programmers in the past on the LAP 
project and understands NCBR's needs in terms of this project. 

Dr. Rand can be reached" at (213) 825-4647 and has office 
hours daily from 1:00 pm to 2:00 pm. Dr. Rand suggested that 
NCBR. take him a sample 5f data punched on cards, to run it 
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through the word frequency and concordance programs; In this 
way, NCBR will be able to quickly determine if the programs- are 
suitable. Additionally, Dr. Rand may know of other programs 
available at UCLA once he has a clear picture of the linguistic 
analysis requirements of the NCBR corpus. 

vi* Conclusions and Recommendations 

Eight computer programs which met at least one of the 
general criteria listed in Section IV were discussed in detail tc 
determine whether or not they could be used to analyze the NCBR 
child language corpus. The first criterion -was to locate 
programs which could automate as much of the linguistic analysis 
as possible and the second criterion was the amount of effort and 
cost of implementing the computer program on the UCLA IBM 
370/3033. 

Six projects met the first criterion, but were 
unsatisfactory in terms of the second criterion. These were: the 
two machine translation projects based on Junction Grammar in 
Provo, Utah; the machine translation project at the Linguistics 
Research Center at the University of Texas, Austin; General 
Inquirer 11 at Syracuse University; the Linguistic String 
Parser at New York University; the two projects at IBM Thomas J. 
Watson Research Center; and DIAGRAM at SRI international. Of 
these, DIAGRAM may be acceptable in terms of ease of 
implementation if a timesharing agreement between SRI 
International and NCBR could be negotiated. Problems still 
remain as to how adaptable DIAGRAM is to child language data. 

In terms of satisfying both criteria, GALAS appears to be 
the optimal choice. It is relatively sophisticated in terms of 
the linguistic analyses that it can perform and it should be 
fairly straightforward to install CALAS on the UCLA IBM 
376/3033. Additionally, a number of researchers have already 
used CALAS, so NCBR has the basis to adequately evaluate the 
program before deciding to use it. It is recommended that NCBR 



contact Dr. Pepinsky and Ohio State University and Dr; Meara at 
the University of Tennessee for a first hand assessment of the 
capabilities of CAiAS. 

And finally, the word frecitiency and concordance programs rt 
UCLA best satisfy the second criterion but are deficient in terms 
of the complexity of the linguistic analysis they are able to 
perform. Since the use of these programs require very little 
programming or technical support by NCBR, it is recommended that 
NCBR explore the possible analyses offered by these programs with 
Dr. Rand at UCLA. 
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Appendix 

Automatic Linguietlc Analysis 

This appendix consists of abstracts taken from Lockheed Dialog 
and NTIS searches. The abstracts are grouped in the following 
categories: 

ebmptiter Models of Thought and Language .Section 1 

Theoretical • Linguist ic Models and Parsers Section 2 

Machine Translation. .............................. ^ . . .Section 3 

Concordance Program. .................................. Section 4 

Automatic Linguistic Analysis, Outside the tJSA. ...... .Section 5 

Automatic Ihaexihg and Text Analyses .Section 6 

Hisceilaheous Automatic Language Processors....* Section 7 
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to have an underlying conceptual structure, it 
the machine structure its own experience, both 
uistic, in a manner cbncpiitant with the huaan 
Some previous attempts at organizing the 
ase conceptually are discussed. _^ A_ 
ependency grammar is posited as an interlihgua 

an abstract representation of the underlying 
be conceptual dependencies are utilized as the 

a stratified system that _ _ _incorporat es 
zatibn rules to lap from concepts and their 
s. In order to generate coherent sentences^ a 
s posited that_ limits possible conceptual 
ents about the system^^s knowledge of the real 

been programmed; coherent sentences have been 

is operable. The entire system is posited as 
ry, (Author) 



DESCHTPTOPS: {*Learnihg machines* Artificial intelligence), ( 
^Programming languages* _*Computational linguistics) , English language. 
Semantics^ prbgraiaming (Computers) , Grammars, Theses 

PB-183 907 C?STI Prices: HCSB.dC HPS0.95 



48 



09128 PPC YlAP 1972 VOL NO 

tlnd«r s t and ing_ natural language. 

ttinograd. Tarrv _ 

Massachusetts Inst, of Technology 
Ccghltlve PSyc;^ology 1972. dan. 
CLASSIFICATION: 11 -- -- 

_ Qescrioes a computer system that answers 
commands. and_ accepts information in an 
dialogue. It isbased on the assumption 
language understanding. _w« must deai jn an 



48 ABSTRACT NO: 09128 



Vol . 3(1), 191 P 



questions, executes 
interact ive Engi jsh 
that in model ing 
ihtegratifd way with 

all of the aspects of language, syntax^ semahtlcs. ana 
infirence. The system contains a parser ._arecognlt 1 pn grammar 
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naiyjia turns out to ba wrong^ tha Tj^hgulst will first 

ttampt to corract tha thaoratjca| mbdaj bacausa it Has baan 

hown to ba wrong, wharaas thosa who diMgraa wit 

•thod can _pn1y ^adjust thai r thapry in a_haphazard_ fashion^ 

PPl^catlpn and_thapry ara thus dapandant on aach othar. As a 
ffult , __tha_ linguist wi 1 1 banaf i t most by a combination of 
haory and practica. Any AA should hava a doubla aim: ( 1 ) is 
spaculum for tna modal usad; and (7) as baihg appHcabIa in 
lalds othar thin pura linguistics, a^g. . in tha dascrlption 
f J*"S?^*9!*- ^SA program Is dlvidad_ into para and 

^ntagmat ic parts. ¥iiaraBs_ in _tha_paradigmat 1c analysis tha 

ilations thataxist In i^ords Datwaan actualizad and potantlal 
■lancy ara indieatad, tha syntagmatic analysis will indicata 
^a ra lit Ions batwaah tha words that const ituta tha santanca^ 
rha data 1 lad ASA program can only ba ayajua tad tha axtant 

3_which It can now lAya_ up_jto_ axpactatlqni in actual 

ractica^ Tha__AA_ program, has a doubla aim: ti) tna 

•rif icationof tha modal usad for linguistic dascrlptlons; . . 

Id (2) -If this modal saams to satisfy prasant haads. thi ^ 
;tuBl application of tha modal. Grammar can ihdaad ba 
drmaHzad. and as a rasult must da mada machlna-appi icabla. 

■causa this would saam to ba tha only _way in_ which 

-ammatical _thaprias can _ba_ axamin«d in_ _ordar_ _ tp_ avoid 
tslaadinglntarpratatlons mada by the "understand ing reader." 
im next step must ba to evaluate tha modal used- hare i^tth 
igard to the large group df^axlstlhg theprias. To this end, 
FfbPts <h tha field of the f ormaj jrat ion of grammar and hence 
itomatic analysis must b4a increased. 

Descriptors: DATA PROCESSING AND RETRIEVAL: APPLIED 
[NGUISTlCSi SYNTAX .v ltHEORETICAL LINGUISTICS 

Xdantjfiars: automatic syntactic analysis; thaory. 
Ithodology; 
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Human Aside 1 at iva Memory 

PP0!^_AUtH0R: Anderson. John R: & Bower. Gordon H. 
^••nan^_Janica_M^ 

U_Oenyar. Uni varsity _Park_CO S02i0__ _ 

Languaga_ Sciences- 1976. 39. Fab, 30*32. CODEN: lasc-b 
Sarias: REVXCiV 

Naw yorft: S^le/. Ma? sted Press. 1973.Raaearch Center for the 
.anguagaSc lances, Indiana UnlVarslty. 516 E. 6th St., 

i^ppmihgton IN 4740^ 

Sect lonHaadIng Codas: 4016 LANGUAGE : Engl : 
In recent yaars it has Dacoma apparant that tha distrhctibn 
»atwaan 1 ingulst lc compatahce S linguistic per f or mahca is 
julte fuzzy. What is headed is a model that un1tas_the_a --_a 
'^^•A_that represents a _ speaker /haarar^s_ knpwledga__of__ the 
anguaga in tftrnspf the rules. or_processas required to change 
'rpm__i nantal ttata to tho next. This is an Imprassiva 
ittampt at such a modal. Mhlla the miodal suffers from Its 
al lance on tha traditional^ ^ yet questionable, tehats of 
tsabciat idhiam. tha^bobk doas ah excellent job of present jnsp_& 

ihalyzlng tha Problems involved in constructing a .natural 

anflMaga _prpcf8iilng system. It provides many insights. for 
aadars intarrittd in tha interface batwaen competence & 
a^formanca. %A ^ ' 

_ Oaacrlptprs VEReAL LEARNING; MiWRy : COMPETENCE AND 
PERFORMANCE; VSVCWLINCUISTICS 

__idant If iar Si _ _ human associativa memory; competence va. 
parformanca: book ravisfs; 



ERLC 



4S 



, S€C"tion 2 

•diFut«ionai Bndirstanding : Aaalysis of Sentences aed Context 

Jockvill«. Hdi (094120) 

fechniCAl repti . , -- 

iOTHOP: Biesbeck* Christcpher 
:tt262S3 FLC: 5G, 92D* OSSHDH/SOS 

Sav 7tt 250p 

RE?T 80: STAN-CS-7tt-tt37 , ilM-2i8 ^ 
rONTBACT: DAHC15-73-e-9tt35, PHS-!lH-066tt5 
PROJECT: ABPA brder-2tt9U 
HONITOP: 18 

Si" .";;„'"„"ii!i'i;: ™; !Sii.s;., «i HI.....!.. '•■ 

cbmFriheisding that text» 

descriptors: *CoBpu tat ionat linguistics. Natural language. Data 
processing. Speech Recognition, Semantics, Syntax 

iSENTIFIERS: NTISDODA 

AD/A-0C5 CttO/IST NTIS prices: PCS7.50/nF$2.25 

7803SSO 7802550 

Tti« COtwalar and Llt«rary Studies 

SoSk sJJthOR: Altk.n. A. J.. Billiy. R. HB«1 1 ton-Sn. 1 th . 

N. (Eds> _ f, a 

Griinolatt. Danial C: TallaBHra, D. R.: Mar.tln W. 
StyT^- 1I74. 10. 3. .umm-r. 2<M-295. COOEN: »tyl-b 

"of«ltlng''l|n|ict1c **ioa1ty: Thr- *ug-*nt.d Tr.n.ltion 
Nitwbrk T«ehnlqu«s, 

(AtNl for*. ,P'^!^i;7t,,r Jtn, thrSugh thi *tNs. Each 

ridueai to finaing all possl^^la ^'^•"J", 4TN ^niratii ah 

iuccMifully tsrwihating P.t^throu^ 

•ccaptabl. par.lng ^'^*'^^.i"*~*-^**^^r1bSd alOhg with th. 
CdhvBhtlons for travarslng •■ch.^Th. S¥Q^ arid In 

riflird to ■PP'-«^'-i«^*^^»i'''^^"SVs^,^2d flHdlng m^^ 

ieciptabli pSth.__^^thro«g2^^>TN.^^^ ^ -^^j ind 

-^Backtracking,- „ S^wal tao«>u» ?f r?« • 5^,^., 

-iiipyt.t. ft*^ ,-"!i'of co-ibut.r iKScutlon: tlM. 

tichhlqo.. ar. dl,c^..d in t^« of^«^ 
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STBOCTBBli SiSVS OP CERTIIH CLASSES Of ee«f>LK SEHT»wrr^ FiSf^^''" ^ 

cllfJ"J J-F^®^^"^^ Ki.»SSO? SPOZHN0PODCH1H2HSTKH PREDtOZHENII) (v 
SIIIZI S VOPROSOH OB OHOHIHii SOYUZOT) ^ui.u^as.a±^) (V 

TSff-^^ Technology piv Bright-Patterson iPB Ohio (itt 16001 ' 
AUTHOa: Kaplan, li I. » « / 

_50g2Cl FtD: 5S 0SGaDR682b 

25 lag 67 39o 

BBPT HOi PTD-TT-65-ie93 

mliVU pllfus ilin S^uchno-Tekhnicheslcaya Informatsiya 

JifJJ^S"'wJw^ aathor__deais nith the subject of cbapiex sabordinat^ 
n«J ^^ithxn a sentence in which hoSonymic connlcting words ar^ 
M3d. The relationship between the lain and subordinate clauses, ard 
aolerf^''''«i°^ the sentence Ci.e., how a will tinds to 

govern, or is gowerned by other words, the presence of certain 
graa.atical foras in words, etc.) are discussed. (Autho?) certain 

I)ESCHlPTO?S: (*a^chine__ translation, Bussian language) , psussian 
iSgniKics' ull"^ Semantics, Algorith.s, Analysis, Computational 

IDENTIFIERS: Translations, ioaonyas 

AD-673 a5t CFSTI Prices: PCSB.OO fiP$0,95 



THASSFOSaAriOSS. AKC_ PISCOOHSS ANALYSIS PAPERS. 69. COMPUTABLE ANr 
UNCOSPUIABLE ELEMENTS OF SYNTAX 

Pennsylvania Univ., Philadelphia. (278 950) 
AUTHOR: Hi2, Henry 
6S43Aa FLD: 5G, 917 USGRDR692U 

1967 18p 

GRANT: NSF-557 



ABSTRACT: A syntax df_ a language May be said to be conputable in a 
different sense when it assigns^ .in a coaputable way^ for each given 
usable text, ail its relevant structures. One also nay call a syntax 
confutable if all its rules are decidable* in the sense that for^each 
pair of texts it is decidable whether they are linked by the rule. 
(Author) 

DESCRIPTORS: (*Linguistics^ Analysis), (*Syntax, Hatheiatics) , 
Coaputational linguistics, English language 

IbENTIFIEES: Generative graiiars^ Strings (Linguistics) 

PB-186 U13 CFSTI Prices: HC$3.00 HF$C.95 

CUiMMO TM964029 

7hm SBa8a0« Machin«; A Hmw Two-Stag« Parsing Mod«l . 

Fpix1#p. Lyn: Fodop^ a«n«t_OMn 

CoghitlOh. v6 h4 p29 1-325 Oce 1978 0«C7B 

Linguigij^ fNGLISM 

Th«__t>iman ••ntfncf P«p«1ng davicf assigns phrasa structura 

to santancas in_twp staps^ tha _fipst _staga parsar assigns 
lax leal and phpasal_oodai_to sybstplngs of . wopdSj^__Tha sacond 
staga parsar than adds highar _nodas__to _link_ thaso _Phr>iS9l 
paekagai togathar into a cofRplata phrasa markar. This modal is 
cbflijparad with bthari. (*u^hor/RO) _ _ - . 

Oaseriptbrsi *L.9^HIIH*0^_ ^rbcasslng/ -Ll^ngul^stie -^^^''x/ 

HodaisZ __Phrasa Structura/ PsychoHnguistics/ Sahtahca 

Olagraiiilog/ •Sanianca_Structura/ Syntax 
— Taantlflars: •parsing 
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eD037734 SL00236i 

" *D_Apprp«ch tp_th«_S«iiMint lc» of Varbs : 

yon_G1aB9r«f«1d. Ec-nst 

G«orgia_InBt . for Raialrch. Athahs. 
- _ _ ^8p.; Papar dallyarad at tha Southaaitarn 

Ceof arahca oh L4nQui»tic», Chapal Hill. North Carolina. April 
1970 

$PO!^»Pr<"9_A9«ncyi_ Air_Fprc^^ Sciantif <c Rasaarch. 

Arl lngtpn._ya^ Olractoraia_of _lnf ornatioo Sciahca. 
EORS Prica MF-$0.76 HC-«1.58 PLQS POSTAGE 

Thia papar axp I a^fia a mathod ■••■•ntj^c analysis bayajopad 
in tha eoQria of a natura 1 - lahguaga rasaarch prpiact tha t_ lad 
_^Of'P"**r__i'"Pl?^fDt«t ipn^ Parsar . 

^o^iiAl^. 'J? .iDiirlinguist ic substraium of iamant ic particlas 
of f ygr ili^^ 1 f f Tant _ t ypas (a.g. sabstant iva. attr ibut iva, 
dayaiopfiantal . ralattonal). a mathoa is illustratad which 
nakas It possibia to map tha maahihg of activity words in 

contaxt: tha rasultlhg mappings, on tha br^ hand^ Incprpbrati 

i»uch of what, hithartb, has baan cpnsid^ and 
0"_tha bthar . thay furniah_an ax of tha scfsaotic 

•^•«P_»tructura" yndar lyingtha grammatical surface structure 
of_a_phrasa or santanca. Tha mappings are hare usad tb 
damonstrata samantie similaritias and discrapahcias batwaan an 
Engnih varB and tha Carman yarba which ara raqulrad fbr its 
translation various contacts . _(Author/FWB ) 

Dascriptors: Computational Linguistics/ •Daap Structure/ 
•English/ •German/ Mathematical Linguistics/ •Semantics/ 

''??:^^^or«t1on. _S inf*r«r«. o, tr^ gr.»«.r. for ,ynt.«lc 
pat tern recogn 1 1 1 on 

Bhargava. B.K. 

Conf.renci S744295 D.llM. T-k ^ |-^^0<=* " 

IEEE system,. Min ^nd Cv^;;5°;* rs« Inline: Ord.r 0*pt . . 
Conference Record No. 74CH090B « '-^ --- Fi«t 47 

Institute Of Electrical ind Electronics Engineers. 345 East 

'*0-.^Vtrr^; ?5.ilF0S2lTi0N: TREE: PATTERN: RECOGNITION 
SiCTION HEADING: MATHEMATICS 
Section Class Codes: 6500 

75021505 _ v3n2 

On inference of tree grammars for syntactic pattern 
recogn i t i on 

Gonzalez, Rj^C^ 

U_ Of Tennessee j^ Knbxy i l le^ tenn^ 
_ _lEEE_Systems ^ Man and Cybernetics Soclety_i974 International 
Conference A744295 Oalias. Tax. _ 2-4 Oct 74 

IEEE Systems: Man and Cybernetics-Society _ _ 

Conference Record No. 74CH0908-4 SMC. inquire: Order Dept.. 
Institute of Ellctrical and Electronics Engineers. 345 East 47 
St.^_New Ybrk. N^ Y. 10017^ 

Descriptors : TREE : PATTERN: RECOGNITION 

SECTION HEADING: .MATHEMATICS 

Section Class Codes: 6500 
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Theoretical Issues in Natural Language Processing 

Papers from ah Ihterdlsclpl lhary Workshop in Computational 

Linguistics, ^sychbibgy. Linguistics. Artificial Intelligence. 

10- 1 3_ June^_ 1975^ Camtor 1 dge .MA 

Nash -Webber ^Bonnie: Schank^Rpger ^ 

Cambridge^ MA: Yale Univ. Mathematical Soc. Sciences Board. 

1975: 219 pp. 

Doc Type: feStSChrlft 

Descriptors: linguistics - collections, analyzed 
descriptor Codes: 0301000000 
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Natural _ language processing 

UOShl^ A .K^_ 

1973 National Computer Conference A732237 New York. N Y 

4-8 OOn 73 _ _ _ _ _ 

American Federat idh-of Information Processing societies 
ProcMdihgs. 9 Juh 73: S40.00: Mr: T. C. White. American 

Federation bf information Prbcessihg Societies. 210 Summit 

Ave^^ Mont vale, N.J. 07645. 

Oescplptopsj^ tANWAGEi PROCESSING _ 

SECTION HEADING: QENESAtENGINEERlNG AND TECHNOLOGY 
Section Clais Coaes: SOOO 
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bbsiirvat ions on Contaxt FrM Parsing 

Shan ; B. A. _ 

Statistical Mathoas in Clnouistlcs- 1976, 71-109. CODEN: 
sit1h«'a 

_v_SPra'kfopJ^agat Skrlptop. _P.O. Box. 104 65 Stockholm 15^ 
Swadah (Nama changad to Journal of Ulnquistic Calculus aftar 

t976_yoluiiMiJ 

Sactlon Haading Codas: 5113 
- Tha princlplas- uhdarlylhg cohtaxt frmm parsing ara 
rhvast igatad. ^^f_ df a Wall-fbrmad substrihgp tabla is 

suff iciaht to acM ava pblynomjally boundad parsing^ On the 

bas i s of i ts prasanca in al l known ppj ynomial parsars^ such_a 

day ica_may_ai so_ba_nacassary _to achlaya ttils _bpund. Tha 

dasirablHty of a parsar aotomat i ca 1 1 y achiaving tightar 
bounds for_variou8 subciassas of tha cohtaxt fr«a grammars Is 
axaitlhad S fbuhd to ba dapahdaht ^h tha subclass cbhcarhad. 
It is argiiad that usa of a trahsformad grammar by _tha parsar 

ls__hOt nocassarily a di sadyantaga . _as_has baan prayiousiy 

claimad._ As_an illustration of tbasa idaas,_ a _yar_iant of 
racursiva dascant parsing is davaiopad & its bahayior 
anaiyzad. This algorithm, whan aquippaa with a wall-formad 
substring tabla. is shown to ba as aff iciaht as any known 
gaharal purpbsa cbntaxt frmm parsar. whila its simp la 
structura makas it aaslar to undarstand & prova cbrract . 
•Ibdlf lad_HA 

DaScriptOrS: CONTEXT FREE GRAMMAR; STRUCTURALIST LINGUISTIC 
THEORY 

Idantif iarS: contaxt fraa parsing; 
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_ Pat tarn-matching rulas for tha recognition of natural 
languaga ais logua axprassions _ _ 

Colby. Kahheth Mark: Park 1 son. Rbgi»r C: Paught . Bill 

Computar Sc lance Stanford U CA 94305 

American Journal of Cbmjputat ibnal Linguistics- 1974. 1. 
Micrpf icheS. i-82^_ COOENi aic 1 -d _ 

Caotac for Appl iad Linguist ics. 1611 N. Kent. St . . Ar 1 j ngton 
VA 22209 (Including The Finite String as of 1974. vol. 11. No. 
1 ) 

Section Heading Codes: 116 

Man-mach j na d 1 a 1 ogues usj ng everyday cbnyersa t i cjnai Enfili j »h 

present _ dl f f j cy 1 1 probj ems_f or computer process 1 ng of na tura 1 
language. Grammar-based.parsers which perform_a word-by-word^ 
par tsr Of -speech analysis are too fragile to operate 
satisfactorily in real time interviews allowing unrestricted 
English. In constructing a simulation of paranoid thought 
processes. an algorithm capable of handHhg the linguistic 

•xpressipns U8ed_ by _interviewers_ in teletyped diagnostic 

psychiatric interviews _was _desiflnedj, Ihe_ algorithm uses 

pattern-matching rules which attempt to^ character ize the. input 
expressions by progressively transforming them into patterns 
whl^h match, completely or fuzzily, abstract stored patterns: 
the power of this approach Has in its abl H ty to ignore 
recogn i zed and unrecognl zed words and sti 1 i_grasp the meaning 

of _ the message^ The methods _ ut i j lzed are general and cpuid 

serve any "host" system which takes natural _ language inputs 
Appendices contain a sample. interview. the dictionary, and a 
llit of simple patterns. HA 

__Cif»crjptbrs: __DV*OIC _INTtQACTION : OATA PROCESSING ANO 
RETRIEVAL; ENGLISH; MEANING: SPEECH RECDGNITION BY MACHINE 
__ I.dent If iersi algorithm for pattern-matching rules for 
computer recognition of natural English dialogue; 
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Tebatio __ 7ib2i6b 

i^uoction Qrimear as_«_8as« for . Natural Languaga Prpcassing 
tytia, Eldon G. : Packard. Dannis: CSlbb. Oaryl: Malby. Alan 

K.: Bintngl. Fieya N. . Ur . 

BMghaM Young U. Prbvo UT 84601 

AMfrican Journal of CcNRputatibnal Linguistics- 1975. 3. 77. 
CObENL •Jci-d 

Cantar for Appi lad Linguistics. 1611 N. kant St.. Arlington 

91 22209 f Including Tha Flhlta String as of 1974. Vol. 11. No. . 

1)__ 

Sactlbn Haading Codasj^_065 

_ _i^nctipn_GraM9ar^ a_modal of languaga _s t rue tura da valopad by 
Eldon.tytla. _ _ is baing usad to dafina tha Intarllngua _for a 
iMichina*ass istad transistion projaet. function Grarwiar 
raprasantat ions, can*^ junction traas. consist of wbrd-sahsa 

Information Intarralatad by juhctlbhs. which contrjbuta 

•yD^^fCtjc a sawantic infbrmat ion. _tba jst stap_of_tha_ currant 
traniiat lon_systaf»_ is intaractiva analysis. _ during. which tha 
program. intaracts with tha huMn oparator to rasolva 
ambigultias ft than produeas a junction traa raprasantat ion of 
tha saaning of ^ha Input taxt. Tha 2nd S 3rd staps of tha 
trahslatlbhprocassara automatic transfar & »ynthesis into j 

or mora targat languagas^ For _aach .targat languaga tha 

transfar stap_makas_adjustfpants on_ aach junction__iraa^ __if 

naadad, _ bafora_ sanding. 1 1 . to tha synthas is program for that - 
languaga. This translation systam is. currant ly _ undar 
aavalopmant at Brigham Young u« Prasant lax icons for English 
analysis^ S Spanish. Garman^ Pdrtuguasa syhthasis 

contain about lO.OOO word-sahsas aach. _HA 

baser Iptdrs: CbMPUTATIDNAL LINGUI ST ICS MACHlNjE TRANSLATION: 
INTERNATIONAL LANGUAGES: AMeiGUlTY: MEANING: ENGLISH: SPANISH: 
GEBMAN; FRENCH: ROMANCE LANGUAGES 

' Idantiflars: junction grammar: modal languaga structura for 
natural languaga prbcassihg. machiha translation: 

ENGtiSa DrCTIOSifiY CLASSiriCiTIOH 

linguistics: Research Canter # Oniv, of Texas, Austin, 
iUTHOS: lee^ Ttiie C5it# 

0313ra ?lb: 5G USGBDE6603 
lug 65 29p 
aSPT Kb: 1RC-65-WD-1 

SHiNT: HSP-SS-308__ __ _ 

See also PB-166 656, Disttitiution: Ho linitation* 

ABSTRACT:, _The_ paper contains a description of the classification of 
English adjectiTes/ nouns and_ terbs in the linguistics Research 
STStes, Paradigis Sate been def ised in chart fori defining certain 
characteristics peculiar to subclasses__to parts of speech for 
adjectites. nouns and Terbs. concise explanations of each subclass 
with exaiples are also given, ill subclasses are ordered with the 
ibst freguentl; used subclasses listed first. 

DBSCBIPTbas: (*Ehglish language^ Classification)^^ Coiputational 
linguistics, Seiahticsr Syntax, flachine translation. Dictionaries 



(208 250) 



SBEHflPlERS: Adjectites, Houns, Verbs 



?a-16B 758 CFSTl Prices: PCS6.Gd HPSP..50 



languaga santancas containing unknown 
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Parsing of natural 
MOrds 

Dahkal . 0.0. 

y Df_111ln6lf. Urbaha. 11. - 

Association for Computing Wachlhary North- Cantral _Raglonal 
Confaranca __A771149 Urbana^ IlHnbIs 25-26 War 77 . ^ 

Association fOP COBputing itachlnary (North Cantrai Ragion) 

Procaaaings. 2«_llar 77. $5 plus imI l ino costs: Studant ACM. 
Oapt of CbSputar Seieoca. Oniw. of Illinois. Urbana, IL 61820. 

Dascrlptors: LANUUAQC; ONKNOmS: WORD 

SECTION HiAOINQ: HAtHMATlCS / 

Section Class Codes: 6900 
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iS iUTOHiTSe PHSIS2 STauCTnSE IHILISIS OFA SpiHISH T2XT 

Lingtiisti« Research Ctntet, Univ. of Texas, Austin. (208 250) 
AUTBOS: THdias, Carolyn Beth, 

baa<IE1 Fib: 5G USGBDH6610 
Sep 65 13 1p 
BEPT HO: tac-65-WD-2 
GEAHT: HSF-GH-308 

ABSTBACT': A stiSmary of lorphological and syntactic ciassificatibn is 
presented for _a pilot description of Spanish in the Linguistics 
Sesearch System. Saaple displays are giyen for cohtext-free phrase 
structure description and the resulting aachine analysis. (Author) 

DESGBIPTOHS: (*Spain^_ Language), (*Language, Spain), Context free 
graioars^ Computational linguistics. Syntax 

PB*169 aee CFSTI Prices: PCSlJ.6d SF$1.00 



Semantic Directed Translation of Context Free Languages 

Ohio State Univ., Columbosi Computer and Information Science Research 
Center. *National Science Foundation, Wasihington,. D.Cm (U07 586) 



Tecbnical rept. 

ADTHOa: Euttelmann^ H. William 
C5C«2Ki* __rLD: 05G, 92D USGBDE7519 
Sip 7a 39p 

SEPT_NO: 6sU-CISBC-TB-7a-6 
GBANT: NSF-GN-53a.1 
aONITOfi: 18 

ABSTBACT: A formal definition for the semantics of a context frfs 
language, called a phrase-structure semantics, is given. The 
definition is a model of the notion that it is phrases which have ^ 
leaning and that the meaning of a phrase is a function of its \ 
syntactic structure and of t;ie meanings cf its constituents, Next the 
author gives a def initicn f or translation on context free languages. \ 
He then studies a certain fcihd of translation on cfl •s^ ^hich proceeds \ 
by translating on the phrase trees of the languages, and is specified \ 
by a finite set of tree--replacement rulesi The author presents a 
procedure shich, given a cfg and phrase-structure semantics for a 
target language^ «iii (usually) produce the finite set of 
tree-replacement rules for the translation^ if the translation exists, 
the procedure may be viewed as a computer program which is a 
translator generator, and which produces another program that is a 
translator. 

•^Phrase structure grammars^ *Semahtics, *?!achihe \ 
Syntax, Computational linguistics, Recursive functions, \ 

iDEKTiFIEPS: Phrase structure semantics, •Context free 
iTISHSFSIS 

PB-2a2 SSvaST BTIS Prices: PCSJ. 75/HFS2. 25 
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• : Section 3 

Syntaciic Analysis cf th« Ptssiah Sen*erc« 
iba aatjcn aesaarch Centar TcrktcWn Biifhts K if (3«S 250) 

- - - . f 

Final r»cti Kay £5-Hay 6*7 _ « w - 

SeiESS: ilatii, Marre: J.; «ndr ay Sfcslcy , Alexanfl^r; Strcit, Schert c.; 

iintBah, IcharJ 0. 

D1773SC Fid: 5G, 5E d7">09 

Cct 67. 170p 

Ccnttact: ftF 36 (632) -3782 

Project: AF-a599 

scaitcr: SAi:c-i3-67-ae« 

Eistrihu-tich lisitatica cow reroved. 

Abs— ac- : The raport describes results c* a twc ytar research effort 
is -hfe fislii cf autcniatic syrtactic aralysis cf Fussiai: aithir. tr.s 
*-a9-»*crk cf R Jssian-En glish aachine translaticr S ard E. Tha primary 




Russian* aas conducted in paraliel an extinsicr cf the research 

if'-'t ini-ia--d' at Harvard Oniversity with tne KSF sacFcrt^ 
P«!oro.a3C3 "of tfaa Fr^dictava analyzer or. the test corpus cf 16C 
Bassian S2nt€nc2s is d€scritcc3, 

D^scrictsr^: i*3us3ian language, '*Syntax) , {*Sachihi trarsiaticn, 
^j£«iaf largudgs) , Cc Epa tatior.al linguistics, Autcsa tic, ^ngiish 
ianquui>, CoisEutar prcgiaos, Prcgr-iming. languages, ftigcrithns, 
cootmatcrial analysis, licticnaries , Sutrcutxnes, Lircuistics 

Idantifi^fs: Syntactic analysis, KII£fCEXE 

AD-e2a .?51/'iST NIIS Prices: PC ft3d/CF AC1 

nactaine Translation (1 Bilsliogra pby %lth Abstracts) 

National Technical Information Service, Springfield, Va« (J9l 8l2j 
Rept. foe 196u-Feb_75 

ADTHOE: Lehmann, Edvard J*^ Tbung^ Hary E* 

C«65aD3 FLp: C5G, 095, 92D«, 88, 62, 86U OSGBCB7513 

aay 75 132p* 

•50H1T0R: 18 • 

Supersedes eoH-73-1 17 17. 

ABSTBACT: Studies on sac&lne translation of various languages are 
presented as abstracts in this bibliograipby : cf Federally*f unded 
research reports. Topics concerning syntax, coaputar programoing, 
cosputer hardtiare, and seiantics are included* (Contains 127 
abstracts) • 

DESCHIPtCRS: ^Hachirie translation, ^Bibliographies, Coaputaitibhal 
linguist ics# Syntax^ Seeantlcs^ Ccaputer prograaaing, focaoulary. 
Translating 

IDENTIFIZHS: HTISNTIS . 

MTIS/PS*7 5/«*11/9ST MTIS Prices: PC$25iO0/HFS25.O0 
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Vale nniv Haven D^bt of Computer Science (ttO"?Oc;i) 

fgi^Qo^^^l^ r^t)*» - - - - - — _ . 

i^nrHC?: Carbon^ll, :Taii»«» n, cullinfor-3, ?icbar4 r. : Gi=»rshi"an, Ar.a*ol* 
y 

»>er ''8 f Ip 

cbr tract: vobo 1 o - 1 1 1 

"dnitdt: 1*' 

Availability: ricrofich* copieis only. 

Sbstrac*- ""his pao^r discusses knowle'lqo-bas-d Machine translation 
t-esoarsh'a* Yale "ni varsity Ar^-ificial Intelligence laboratory. '^ur 
barafliqst ill'istrated bv several working romnut.er programs, is ti 
aralvzo th* source text into a lamq'iage-free represen^-ation , applv 
worl^ ]tnowl«»dge to inf^r information implicit in the input tex^, art! 
generate tho translation in various *arget lang-jages. (Pathor) 

ne^c-iD^-ors: •■achire translation, '•Ar*!? ici?. 1 irf^lliqenco , 
*edn?uta*ional 1 irguistics, ^'ataral larquaae, rnforaa*ion proc35?iPa 

■'der tif iers : 'rowledge, NTrTrrXA 

? - 

fsssarcb ca Chlaese-*En glish Kachic/e T.ranslaticn 
Califcrria Tiniv Esrkeiey |071 85Q) 

Final tec^hnical rep** 1 Jul 67-31 Jul 69 

AUTMCS: 'Sang, Silliam S-Y; Pcugh€rts, Ching-Yi; Ccughty, Herbert III; 
JchnscE, Ctcuglas; I4e, Sally H. 
D0221I3 Fid: 5G d77C2 
Fab 69 U6i 

Contract: F306e2-67-C-Q3U7 
Project: AF-U599 

flcnitcr: F Ar C-TP-6B-57C _ 

£istritutibn liaiLaticn new reipoved. 

Abst'-act:, Tt-^ report dcccients results _c£ a 13-rcnth effort in 
Chinase-English aachih*! tr erslaticn F and Hain_»BFhasi5 was piac€d 
ca design cf autcnjatic Icokuf systsi fcr sfegaentaticr c£ Chinese test 
iato units cf naaning* end design cf autcaaric syntactic analysis 
sy3t3Bi fcr recognition cf Chinese sentence strocturei The following 
tasks wara prograssing, _cciscurrently: further compilaxich of lexical 
data with tefinad granaar codes^ and ccntinuing scphist ication cf 
rules fcr autcaatic syntactic analysis. Cbmpleticr cf Syntactic 
Analysis Systen JSftS) and associated sukroutines cctst itut as a oa jcr 
achiavea^at* ecntinuation jsfiasa will te devoted liiainlj tc interlingual 
transfer prcblem _ and .synthesis in.Etglish, culBinatinc in desinn or a 
prototype systaa for Chinese-English rachlne translatlcn. lAuthcr) 

Detcriptors: {*Chine3e language, *aachint translation) 0 Syntax, 
C^acutaticnal linguistics, English latguace 

Idestifiers: N^ISCODXE _ _^ _ 

_ :__ Oi 

AD-ISO CC5/2S1 SIIS Prices: PC Ae3/FF ftei 
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tri«U-.i« U Ublio,x.phy .its "str««, 
K.tio«l «=.«i«i tn£or.att« «r.i=., SfringfUld, «. <391 81., 
gept. for 1964-ttay 76 

40TH0|: ^°»J|^/|||,^o9B, 92D», 88, 62, 86« GSftl76l5 

w..- ^.rsnsiatioc Of various languages are 
ABSTStCT: Studies: on »achine t|anslat IOC .p^ogra«»ing, compu.er 
citedi Topics concerning syntax,_coBpu^|t f ^J^^ted bibliography 
fair««e, U r^f^fi^rare n^f'IntrL. to the previous 

contains 136 abstracts, 

edition^) __ ^ . 

Translating 

IDtrriFlERS: foreign languages. NTISSTIS 
,TlS/PS-76/G.3U/1ST KTTS Prices: PCS25. 0C/KFS25.CC 



730501? 7305019 

Atitowftttc translation oi" natural langaaoas 

?n?^rft«ion & Comeut-r Sci-nca, U. C.Hfornla.^lrvln. 
OSadaluB- 1973. J02 ( 3 ) . 2 1 7-230. CODEN : daaa-a 
280 Niwtbn St.. Brooklioe^ Mass. 021*6: 

Saction Haading Coaas: 045 , .i..^ isi-i,*^ 

* consldarition of itt««pts to bul Id a translating m.chlna 
for natural languigis as wall as a aiscasslon 5' Prpbla^s in 
ths itody Of Waning. Althbugh withdrawal of govBrn^fnt 
n.s c«us.d a loss of iht-r-st lo^^autoniat c 
tr^sTItloh. ion. systams hav. bean davlopad iPClud no:_ 1 
t»4 Sirk II translator: 12) tha -Gaorgr-cown proaram-| 
tS! Sand Corporation's miHD systam A fourth systan. Is also 
prSpo^d in wRich mat-riiil would 6. trans lat-d into ■ l-^P^-g? 
ib csostractad that iich foriign word , ' ' * ^"^^^ 

^Ll^^Sd by a coontiriart in an iSrtlflclal .languaga t« 
o^'to-ona borraspSliSlnc.) Which would bS Such to J.arn 

th'n thS for-ign l.ngQ.g. lis.lf. Cpmput-rs ar. now b.^ng 
ua>d to study maahihg through programs that mi ml c human 
behavior ^or procasslhg of taxtaal data. It was thought that 

aTfJirSnt .-ti of raqulrSmant* -o^JJ ?--"2.finrTv*di?f-r-nt 
ioa that It would be n#c-"«ry to aasign «.-mially^dlff-r?Dt 

ilgorl trims for basic 1 inguistlc proe.ss.s. ^It 
thi D*?t algorithms will ba vaMants of a ovarall 
.irat^^ Thraa strataglas haya baan proposad for oBtaloing 
«ructurii for irbltr.ry santancas. B-sld#s th- prob -ms 

ir^syntalllc anilysii. thBrB ara many P'"'""«»«,^" 

tStceHtputatlonailinguiitli coming^ «.a.^^^^ it is in 

^^iis^::;y^'* ''^^;^'m^mri^:i^\ svnthetic 

'^^^.^^^'of natural Lnfluaga'.: 
protolMS of Manino; 
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I Short Ccncordaiice to jtaurence St«rn«*s Sectiaentai Journey 

Ifiroagb Franc* and Italy by Sr* iorick** troiuae I. A-t 

lilinbis Oniy,^ Urbana. Cfpt, of Computer Science, *Prihc«tbn Oniv.^ 
M.J. Depti of Statistics. ^National Science Pounaitibr*, Mashington^ 

DACi_DiT* of Computer Sesearch, (176 Oil) 

lOTHOR: Pasta^ Betty B*^ Pasta, Sawid J., Pasia, John 
C3793I« FLD: SB, 88E USGaDS7426 
Sep 7a 227p 

8EPT NO: 010CDCS-H-7a-676-Vcl-l 
BCHITOH: 18 

See also Volume 2, Pfl-236 233. Prepared in cooperation vith Princeton 
Unit., S.J. Dept* of Statistics and Sational Science Foundation, 
Hashington, D.C. Oiv. of Computer Besearch. 



Section ^ 



KBSTBAC7: 



The 



Concordance 



to Laurence Sterne "s last worJc, ft 
Sentimental Journey through France and Italy by fSr* Yorck, eapi^ys a 
KWic (Xey¥ord-in-Context)^ f^ra_ which centers the iord on the page and 
includes the words of text immediately preceding aad follbwihg. The 
keyword types are in alphabetic order listed with each toRen given in 
order of appearance in the text* in the listing, special symbols 
precede the alphabet and numerals follow the alphabet. A 
word- frequency list containing all the words in the Journey is 
included. Some high ^frequency function words were bibciced in the 
concordance, and this refluced its size from tt0,635 to 26^188 lines. 
Blocked words include certain articles, personal prchouhs, parts of 
verbs to be and to have, and prepositions .in , of, and to. 

pESCHIPTOHS: *Coordinat€ indexing^ *Bo6ks, *Ihdexes (Documentation) , 
Data processing, _ Computational linguistics^ information retrieval, 
Words (Language) , Literature (Fine arts), English language 

IDENTIFIEES: *cbncbrdances. Permuted indexes, NTIsnu, 'JTISNSF 

PE-236 232/SSL NtIS Pricess PCi7.50/flFS2.25 

AUTOMATIC LlSGUiSTlC CLiSSIFlCATiCN 

linguistics Besearch Center, tiniv. of Texas, Austin. (208 250) 

AtJTBOHs Pendergraft, Eugene b. , bale, Nell, 

_0313F3 FLD; 5G IJSGEDR6603 

Hoiv_65_ a6b 

BSPT NOT LBC-65-WAT-1_ 

CONTBACT: DA-36-039-AHC-d21 62 (S) 

GBANT: NSF-GN-3b8 

Distribution: No limitation. 

ABSTBACT: The work plan of _ a long-range series of experiments in 
automatic linguistic classification is described^ together with 
discussion of a first experiment. The latter is cbhcerned with 
category identification. In_ particular, the_ data resulting from 
auto^iatic syntactic analysis of English__were__us.ed _to identify ' 
syntactical categories which have similar membership. The series of 
experiments will combine the use of automatic linguistic analysis and 
automatic, classification technigues. _ Autcsatic syntactic analysis^ 
and izi_ _later experiments deiahtic. analysis^ will be performed within 
the Linguistics Besearch System. (LBS) . Autofatic claisificatioh will 
b5 carried out within the Automatic ClaM System lACS^. A 

programming interface is being constructed between the two systems so 
that their combined capabilities can be used for automatic linguistic 
plassificatibn and partial selfbtgahitatibh. 

jfingiishj linguage, 



[)2SC»2PTpHS:__ (•tingnistics^ 
>^ f i c a t i oa ) i * Au t oaaA 
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I Short Concordance to laoronce Sterne's Sentiiehtal Journey 

through France and Italy by Mr. Yorick*. Voluie II. fl-Z 

Illinois Univ.r Orbahai Eepti of Conputer Sciencei*Princeton aaiv. ^ 

H.J. Dept. of Statistics .»8ational Science Poundatiohr Washington^ 

p.C. Div. of COfflputer Resfarch. _ (1760111 

AOTHOS: Pastar Betty B.^ Pasta, David J. ^ Pasta^ John B. 

C3793J1 FLD: SB, 882 USGHDa7a26 

Sep 7a 2a8p 

KEPT NO: 0idCDeS-fi-74-676-Vci-2 

ecNiToB: 18 : . 

See also Volume 1, PB-236 232. Prepared in cooperation with Princeton 
tJfiiv* r _ _ N. J. Dept. of St atistics and National Science Poundationr 
9ashingron, d;C. Div. of Coiputer Research. 

iflSTEiCT: the short , concordance to Laurence Sterne's A Sentioental 
Journey Through Prance and Italy by Mr. Yorick contains 26 , 188 vjords 
of the UG#635 word text. Blocked Words include certain arriclesr 
personal pronouns^ parts_ of the verbs to be and to have, and the 
prepositions in, of, and to. The text vas divided into logical 
episodes^ and each word vas tagged with the nunber of the episode in 
which it appears. 

DESCHIPTORS: •cbbrdihat a indexing, *Bocksr ^Indexes (Documentation J , 
Data processing. Computational iinguisticsr Information retriaval, 
Hbrds (Language) , Literature (Pine arts), Ingiish languaqe 

iDENTIFIERS: *Con cor dances. Permuted indexes, NTISiUU, KTISNSF 

PB-236 233/3SL NTIS Prices: PC$7. 50/HF$2. 25 <, 



A Cbmputer-Aided Investigation of Linguistics Performance: Normal and 
Pathological Language 

Iowa Univ Iowa City bept of Barhematics (aC4511) 
Technical rept. 

AU?H03: wachal^ lobert _S. , _Spreen, Otfried 

A1205A1 _ FLD: 5G, 56J USGHDH7101 

Jul 70_ 22p___ 

PEPT NO: THFniS-D7:-ts-29 

CONrBACT: N06eiu-68-A-a5ec 

Peport on rhe theory and Applications of Automaton theory. 

ABSTRJ.CT: A system of twenty FOITEAN and PL/1 programs^ developed for 
an analysis of aphasic and normal speech transcripts, is described in 
detail. The programs aid in lexical, grammatical, paralinguistic, and 

statistical analyses as well as in data grfparation and correction. 

Thev can also be ased in schizophrenic and other kinds of pathological 
language and are adaptable to the analysis of vritten-language samples 
and the investigation of authorship and style. (Author) 

bssCRIproES: (•Speech, *comptitat ional linguistics). Performance (Human) 
^ Pathology, Computers, Psychiatry 

I32N?IFIERS: PL/1 prbgramiing language, FORTKAN, Psychbiinguistics, 
themis project 

AD-7ia lilU NTIS Prices: ?CS3.6c HFSC.95 
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polcM tatigotf* TOCTbaiiry structural PrtquMcy Cbunt - Japanese Data Section 
lait^vs 

ftacttse DniT BeseafcS Corp B I (3 39756) 

pecial reptc 1 Jul 72-30 Jun 73 ^ rS«ri#g c 

g^lo^: Sukle. Robert J., Siron, Hurray S., Pratt, Charles C. 

2S92I1 FID: 5G, 92D dSG;iDS7aiO 

on^73 <t65p 

»PT HO: SDRC-TP-73-228 

ONTHiCT: Di4Ge5-72-C-057a 

OSlTOa: 18 

BS-^aiC-- The iepSrt is a frequency analysis 
or corpus of elicited words. (Author) 

I2SCB1PTQSS: *Words (Language) , •Vocabulary, Counting, Coaputaticnal 
inguisxics, Seiantics, Speech 

:D28TIFX2HS: -Japanese language, •ROrd frequency, Ety«ology, SD 
iD-775 925/i 5TIS Prices: PCJ26.25/HFS1 * "^S 



.oken Language Vocabulary and Structural Frequency count: English 
It a Analyses 

rracuse tlriv pesearch Corp N I (339750) 

>eciai fept. 1 JUl 72-30 Bar 73 

Ithob: Hiron, Hurray S. ^ 

>592HU FLD: 5G, 92D !JS6RDE7tt10 

tr 73 322p 

f?T HO: StISC-f 1-73-1 17 

JHTBACT: pAAS05-72-C-057tt 

)HI?6a: 18 

iSTBACT- The report is a frequency analysis of ▼ocabulary and 
ntinc^' patterns in the EnglSh language. The corpora used are a 
dia ia.Ile' a discussion lession, elicited sentences, and words 
ici-ed for frlae leltences. The outputs are the 5o"o*i^f 

Ihi.*- fli s^iantic frequency of coabined corpus (aedia, discussion, 
Ili^i-d LelniSe^l lilted alphabetically with inflectional and 
rilallSnar valHnU as sub-entfies; (b)' seaantic frequency of 
ailned corpJs l-isted by frequency, (c) JlJSS^- 

:oi corpus %f elicited sentences; (d) H-racks and phi-coeff icier. 

3r corpus of elicited words. (Author) 

SSCSlPTbaS: swords (Language) ,*Vocabulary, Frequency, Coaputational 
iaguistics. Speech, English language 

SEHTiflISS: •word frequency ^ Styeblogy, SD 

1)^775 92tt/tt HTIS Prices: PC$19. 2S/H?$1 iftS 

erIc 
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ED133568_ CS20308B _ _ . _ 

_ D«gr««s of Syntactic andRhatorlcal Fluancy-Cowpatancy in 
Frashfiian- Vr It ing: A Computar- Ass 1 stad Study. 
Chfihblfii. W^niaifi 

77 7p.j^ ?*P*'" prasahtad at tha Annua 1 Mating of tha 
Midwast Modarn Linguigi Asibclatloh (18th. St. Louis. 
Ml8aourl\ Novambar 4-6^ 1^76) 

EDRS_Prlca_HFr$b.83 HC-»K67 PI y8_Poataga . 

An axpl oratory study of. quant itatlya __waasyramant of 

syntactic ana rnatorical fluancy axaminar: studants' writing 
haar tHa bag inning and haar tha and of a - tMO-quartar . frashman 
Engl jsh program. Tha syntactic analysis fbcusad on tha c|ausa. 
Which wasclassifiad according to basic syntactic typa and 

alaPorating_ syntactic strycturas^_ _ Tha rhatorical analysis 

concantratad_. on tha. or thograph ic unl t_and _inciydad counts _ _of 

salactad rhatorical faaturas and counts _ of logical 

ralat ibnshlps &atwaah succaisiva units of thought . Pral iminary 
rasults ara rapbrtad. though in gaharal tha niaasuras chosan 
did not discrlmihata batWaan tha 20 compos l^t ions Writ tan at 
tha bag inning of tha prbgram and tha 20 writ tan at tha and. 
(AAAUI __ _ 

Dascriptors: Coll aga _ Frashman/ •Composition SklHs 

<Lltarary)/ Highar _E ducat ion/ tanguaga _F1uancy/ *Languaga 
Pat tarns/ Languaga Rasaarch/ •Rhatbric/ -Syntax 



7502755 7502755 _ 

A 1 1 tarary analysis by computar 

Waltmah. Frank! inM. ___ _ 

Foreign Languagas Stata U Naw vbrk COll Cortland 13045 
Hispama- 1974. 57. 4. Dac. 893-898. CODEN: hisn-b 



7304688 7304688 _ _ .. ^ 

A cbmputar-asslstad study of tha vocabulary of young Navajo 

childran _ ci^- 

Sppli»<y. Barnard: Holm. Wayne; HOHIday. Babatta: Embry. 

Jonathan _ _ 

Clngulstics^ U. _Naw Mexico 

Cbmputars and tha Humanities- 1973. 7 <4). 209-218. CODEN: 

cohu-a 



7935243 79-3-0O0653 _ 

Sami -Automat IC Construct lon_of _ Samant 1 c Concordances 
Fraenkal. A. S.: Raab. D.: Spltz^ 

Computers and the Human 1 1 las . US ISSN 0010-4817. Flushing. 
NY. 1979. 13:283-88 



£0108633 IR002150 __ 

Design Document: KSIC Moaula: L.A.P. Version I. 

SoCthiest" Regional Laboratory for Educational Research and 
Deveiopmeoi. Los Alamitos. Calif. 
26 may 72 9p^ 

Report NO. : SBeC-IN-5-72 -37 

FOPQ Price IIF-$0. 76 hC-$ 1.58 PLUS POSTAGE 

Th^L^nguage *n^iy»<» Package_ ( LAP ) was__d.val oped by the 
Southwest^aronal Laboratory (SWRL) to assist ^M"^^^*^' " 
thrShrysls°of_lanQuagi u«ge. Th. *;«nctloh Pf^ the ^Kw C 
iKiOuora- in Contest or Concordanc«) KoBulB of tf5« t*^, 
nroduci k«9w6ha nmtinoi froS thi input tSxt D.ing .n^lyx.^^ 
lu=~?%t:?;Srwil, e.;;?.1n lo«ti.n information "--J-^-J-^^ 

document idintifiSr. P«»«- P^'-'O'^*'"'' /"^^li"?;. withth. ' !• 
f«atur«s mrm prM.nt.d in this document tog.th.r with th« fii« 
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Section 4 

Sp6ic«n language fbcabulary andl struct oral Frequency Count - Snahili 
Data Analyses 

Syticttse Oniv Hesearch Corp H I (339750) 

Special rept. 1 Jul 72-30 Jun 73 _ 

An?HbE: aubaiar Ibrahiir Sironr Htirray S.^ Prattr Charles C. 

C259212 __?LD: 52D OSGHDR7aiO 

Jiin_73_ 301p __ 

fifPTSO: SgSC-TS-73-229 

CpNTHACr: DAAG05-72-C-057a 

HOHITOa: 18 

A3STSACT: The report is a frequency analysis of vocabulary and 

sentence patterns' iix the Siiahili _language. The corpora used are a 
iedia saaipler _a_ discussion sessionr__«iicited sentences ^ and words 
elicited for frane sentences. The outputs are the following ^ frequency 
tables: (aj seiantic frequency of combined corpus (aediar discussipnt 
iilicited sentences) listed alphabetically with_ ..inflectional; and 
derivational variants, as subentries; __(b)___semantic frequency of 
ccBbined corpus listed by frequency ; (c) sentence pattern frequency 
frbi corpus of elicited sentences; (dj H-ranks and phi-coefficients 
for corpus of elicited words. (Author) 

DISCBIPTOSS: *w6rds (Language) r *Vocabulary, Countingr Conputational 
iiaguisticsr Speech 

3B£HTiri2?.S: *Swahilir African languages^ Word frequency, SD 

AD- 775 926/9 5TIS Prices: PCS1 8. 25/!!?$ 1 . US 

lahual for the bevelopaent of Language Frequency Counts 

yracuse Oniv Peseefch Corp 1? (3 39750) 

pecial rept. 1 Jul 72-30 Jun 73 

OTHO?: Hiron, Murray S. , Pratt, _ Charles C. " 
2592H3 PLD: 5G, 92D OSGRDR7aiO 
uc 73 58? 

EPiNd: SUac-tB-?3-235 
ONTBACT: DAAG05-72-C-G57U 
DNITOR: 18 

BSTHACT: As part Of a_ continuing project of language analysis^ SUPe 
resents its final ■anual. this lanual is_ an explanation of the 
rdcedures used to collect and analyse data for this project. After 
cplaihing the theory and application of the aethodology^ the lanual 
Lscusses specific problems encountered in the design radainistratipn 
id _ analysis of the ^ language data collected. (Modified author 
)s tract) 

HSCRIPfdRS: *vbcaiulary^ *Words (Language) , Computational linguistics^ 
.^eantics^ Banuals 

)ENT1?IZHS: ford frequency^ Exyiologyr SD 
1-775 923/6 MTIS Prices: PC$6 . 00/H?$1 . 45 



iO<072 7804072 

trvnd* 4n Cbfiiputvr AppHcatldhs to- L 4 tvratiirv 
widifnn. «♦ L» 

tompufr^ and the HuMnltiM- 197S. 9. 5. S^pt. 231-235. 
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Oi«r*s Guide to thm SOLAH Seiantic Analysis Pila 
fechnicai rapt. 

40THGP.: Eya, Tom, Diller^ Ti.othy, Olney^ j6hn 
|?6«3KU rtD: 5G, 9B, 92D, 623-' DSGiDmi! 

Jj Apr 7 5 39 p 

BEPT Hp: SDC-TH-5292/OOl/b0 

CCSTaaCT: pAHC15-73-C-C08e, aaPi brder-2254 
KSNITOI: 18 

"flvi^r fll- ^irSnflp^^f a general explanation of the se«a^^ic 

analysis file of SOEAR (a seaahtically-oriented Lejcical ArchiveT if 
is intended as an introduction and reference .anuai tor th^ ?Ri 

^"h-'^'de^i.n"^'' '"'^^"^ collector. ?fie docusenrindi;,'."! 

th« design concepts, the resulting file structure the ±n*»n^aH ^, ?f 
content, retrieval procedures, and da« collf^JS procedufel" ^ '^^^ 

D£SC3iP7oaS: •Semantics, ♦Speech recognition, English lancuaoe 
^^tt^,^'lll^i,^^^^ processing, Co. puUt io?^l lin^JJI?!!!; 

IDENTIFIERS: STISCODft 

A&-aCC9 328/6ST NTIS Prices: ?e$3 . 75/!1E$2. 25 

Phrase Dictionary Distributioc Analysis and Growth Prediction Report 
Cryptanalytic Coipnter Sciences Inc Cherry Hill N J (406U82) 
Final rept. 26 Jar-26 Apr 74 

AUTHDH: Vaite, J* H. » Boebi# Fisher# J. G., Epstein, S. D., 

Stewart, D* J, 

C3littK4 FID: 5G, 5B, 92D, 88B USGEDE7U17 

26 Apr 74 56p 

COHTRiCr: D&iA21-74-C-0269 

HdNlTOR: 18 

ABSTRACT: The i;epiDrt describes a study of the_ DDC Phrase Glbssary . It 
includes a coaguter program to tabulate word freguencies for blocks of 

phrases of optional sizes. On the basis of these distributions, 

eapiricai and statistical analyses are aade including tvo prediction 
Bodels.- Tvo-vord distributions are_ also includedi Based upon the 
available distributions, a tvo^irord Phrase Glossary size of 320^000 
tvd-*vord phrases vas deteralnede Also included are_ analysis of farious 
technlgues, _ such _as_ suffix _ truncation^ iabedded phrases, and guery 
effecriTenesse Cofparisccs are « DDC systea to other plaxn 

language aachine retrials.! systoas. (Author) 

DESCHIPTdBS: '•'Ihf oraatibn retrieval^ '^'Dictionaries^ 9brds (Language) , 
Occurrence, Hodels, Predictions, Cbaputational linguistics, Coaputer 
applications 

IDBHTIFIEHS: Phrase structure^ ^TISOODA 
AD-78d 9 57/7 5TTS Prices: PCS3 i 75/HFS1 i 45 
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Thd SQAP Data Base for !98turai tangaaga Zhfcraatibh 

B«search Inst* of National Defense^ Stcckhola (Sweden). {ud2 BOO) 
iSTBOHt Palie^ Jacob _ _ . 

C5112J2 _ FLD: OSG, 92D DSGRDE7520 
Jul 75 79p 

BEPT_NO: P0A-P-C8376-nE(I5) 
HONITOB: 13 

JIBSfBACT: The .Swedish Suestibn Answering _Prd3ect (SQAP) aias at 
handling iany different Xinds of facts, and not only facts in a snail 
special application area. Ihe SQAP data base consists of a network of 
nodes corrf spending to objects, p^op^rties and events in the real 
world. Deduction can be perforaedr and deduction rules can be input in 
natural language and stored in the data base. This report describes 
the data base, specially, focusing on problems in its design, both 
probleis which. have been solved and probleaswhichare not yet solved. 
Specially full treataent is given to the data base representation of 
natural Inhguage noun phrases, and to the representation of deducrion 
rules in the data base in the fora of data base patterns. 

DESCRIPTOES: •Coiputat ioSal linguistics^ Computer prograaning. 
Artificial intelligence. Semantics, words (Language) , English language, 
Sweden 

IDENTIPIEPS: Swedish guesticn answering project, NIISSWPIND 
PE-2a3 783/8ST NTIS Prices: PCSU. 75/SFS2.2 5 



770473i__ 770473J 

Automat 1 sell* L««««t is i«rung_-- ?i«l»«tzung und Arb«ltawf1«« 
• ln«8 1 itigaist iscn«n _ Idsnt 1 fik«tipnsy«rfahr«n»( Automat 1 c 
Laimiiat izat Ion Goals and Procaduras of a Linguistic 

Idint i^f icat ibhal Program) 

^•6mr^ Wfl"? _ __ _ -_ 

U Saarlandas. 6600 Saarbruckan Fadaral Rapubl lc of Garnany 
Linguist iscna Bar IChta- 197(6, 44, Aug. 30-47. COOEN: 

Igbr-a _ . 

- Friaarich Vlawag_&_Sohn, P. 0. Box 5829. 0-6200 Wlasbadan, 

Tadaral RapuBI ic of Garmany _ 

Sactlbn Maadlhg Codas: 4610 LANGUAGE: Gar 

Tha goals of this prdjact ara idantlfying & spaclfying word 
forms within a taxt by maans of a larga dictionary (about 
lOO.OOO stams wl th_8yntact lc &_ aamantic spac 1 f 1 cat Ions ) ft a 
graiMfiSt ical componant . Word, forms within u taxt ara to ba 

ipaciflad with ragard to thair laxical codification & 

2jr»gujstic contaxt. Tha prbcadura has 5 staps: (1) analysis 
of inf lactlpnal variants & ratrlaval of stams in cas* of 

laxlcal _a"»blguity^ datactlon of tha various raadIhgS as 

offarad by tha diet lonary , _( 2) datactlon of discontinupus varb 
constitaanta -- a_spacial_Problam_pf Garman (a.g^, •r 9^^9 ^^9^ 
via Ian jahran in dar Tramda varlor«n._^ ^_ lh« was_ lost abroad 
for many yaari)) ft raconatruct Ion o£ tha _CQn>PPund__« tarn 

_v«rldranga^"_ ^ E)a IbSt)), ( 3 ) _ dl samb iguat ion of 
syntactic hoflKsgrapha (a.g.. EngHah ^'laavaa" varb/houn or 

CSarman bll 1 iga ( juat/aqultabl'*) varb/adjact Iva) by 
aistrlbotlonal analysis.. _l4j__ idigntlf Icatlon of Idjomatic 
axprassibhs conaiating of aavaral varbal units la. g., English 
•to kick tha buckat" or Gaffiin dia Kurva kratzan) .-- in this 

casa a ipacial dictibhary coiiponant Is usad, .ft — 15) 

disiwbiguatibh of ia^ntic hombgraphs by maans of sa^actional 

rastrlctioni_ in _conr^ctipn wi th a rough spac 1 f 1 cat Ion Of tha 

syntactic St ructura Of a santanca.AA ^ 

OascnptOPai COMPUTATIONAL LINGUISTICS: DATA PROCESSING ANO 
RETRIEVAL: GERMAN: DICTIONARY : AM8IGUITY ; OISCOURSE ANALYSIS 
Idantlflars: autbawitic laiiifatization of Garman word forms: 

61 
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7«b392a _ 7663928 - Section5 

Toward • SinarStlv* Diip«ncJ«ncy Graiwiar 

a Cologne. FecKiral Rapublic of G.rsianv ^--^^ , 4«--i 

riHotin 1975 36, 2-3. Jun. 121-1*5. COOENjl 1 1ng-a 

NoPf^ Honlnd PubH.hing Compiny. P. 0. Bo^ 211. An..t.rd.n,. 
th« N«th«rl»nas 

S-ctlon H*iaiho Coa««:_050_ --^-- -^ ^- ^^-^^^^^ r=o««ct«d 



graiwiiar with a daap strucTure ouiix «n — r 

rithar__than on Phraii ttructura ralatloh* I* Dapaodancy Basad 
Tranafprwat lonai_Grawipar (Rasaarch Rapbrt RC-1889) Yorktown 
Haighta. NVi _ IBMWatson Ra«_Ctr. )^ Robinson arguas that tha 
cohcapt of haad cannot b#_ forma Uxad_wl thin tha_ f ramaworfe of a 

phrasa-»tructura catagorlal componant^ but_ that It can ba 

formaMy ipaclflad for aach phrasa. 1f_ dapandancy_ru 1 as 
•ta tha structural strings of catagor 1 as . thus suppl ylhg 
addltlonal_lnfprmatlon naadad for sonia of tha transformations. 

In this_ papar^^ _ an _attampt is mada to ovarcoma tha 
shor t com i ngs 1 n Rob 1 nson^s_mpda 1 _by _mod 1 f y j ng har dapandancy 
rulas * adding samantic spacif icatlons _tp_tha_dapandants of V. 
taking Into account soma of _ tha cons idarat 1 ons that lad 

Flllmora to maki up his "casas." H* _ --^^ 

Dascrlptprs: TRANSFDRMATIDNS L ANO GENERATIVE GR«M«AR| TESNIE 
Idant If 1arsi_ thaory of ganaratlva dapandancy grammar; 
valanca. dapandancy vs. phrasa structura grammar. Tasniara. 
F 1 1 imora : 

7890024 7890024 

A Swedish Laxlcai_bata_Basa 

Allan. Stura;_Ralph. Bo _ _ 

SprakdBta_Gotaborgs_U^ Norra Allagatan o S-413 01 Sweden 

Series: AICA 1978 0007 _ __ _ _ _ 

A lexical data base for prasant-day Swedlsh__ls_ in the 
process of being developed at department _0f 

natural - language, processing. U of Gbteborg: The lexical 

material is drawnf rpm_authent ic texts^ Large samples of 

words with _thelr contexts _sri,l 1 traceable are ay^ 

through the Swedish togotheoue.. which maintains word * text 

banks in machine readable form. The linguist lc_analys is^ls 
carried out interactively, using an adapted form of case 

grammar. ^^L Ingu^st 1 c j^hfbrmatlbn I ncl udes grammatical 

constructions; semantic definitions; mbrphbtactic properties 

of the Items; phpnet lc/phonolojSplcal . graphonbmic. styj^lstic^ * 

statistical data; a brief _etymo1ogical note. The 

definitions contain words reducible to _a minimal iist of 

defining wbrds. These def In1 ng uni ts _are regarded as 
indivisible primitives. A cOntrolTwd defining ¥Ocabulary__i8 
used tpayoid circujarjty in the def ini^t lons^ iThis data base 
may have a number_pf _u8es . The sophisticated form of storage 
employed allowsthe material_tp be approached in several ways: 
the material can also be_ immediately restructured In the way 
the linguist Chooses. The data bate's, most .obvious use. 
however. Is for dictionary production. The_first_ thing 

^nerated_ from the data Baie win be an unconvent lonal 

monolingual __Swedlsh djctlbnary «^lch Will reflect the 
distinguishing features of the data base. 

Descriptors: tEXlCDLDGV; GERMANIC LANGUAGES; VOCABULARY; 

DICTIONARY 

Identifiers: Swedish lexical data base: 

750O8184 v3n1 _ 

Technique for parsing ameiguous languages 

Jth^'inhusV* Meeting of Society for Informatics B744204 
Berlin^Ger (FR) 9-12 Oct 74 

Society for Informatics - ^ — - « 

Paoers <Eng 6r Gar) in -Lecture Notes m Computer Science, 
end ?97l: apSrox. OM40; Inquiri: SPr inger Var lag. 175 Fifth ^4 
Ave^^ Mew York. N _Y.__ 

baser Iptbrs: LANQOAOE- - - 

SlCTION HEAOINS: GENERAL ENGINEERING AND TECHNOLOGY 
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^5^"^!^ ''"i™^".. Syntictlc *lodil m S.v.r«l Procidur#i c " -u- . 

0«p«ndtncy qrmnmmr ^y^**'' Section 5 

Kunz«, Jurg.n t«*ircH*cibl in.ry Journiil of- th« _tangu«g« 

lln^l.tlicl. «»i»cr,ptiv. gr.mm.r^^^^»y^^ ' compuft lon.l - 
"•Siicrl^tS;; Soe.«: 0303050004; 0302020003 

FINOSIT: I-CCWUter program for i«ngu-w- 

B«n«vlor»l Science 1969. * 

^*?ScO*- * wl?^count Concor^inc* aen.r.tor 

iirry-BOOghe. G._L ^ch-Kir 1 str 6800 Mannh-lm 1 . 

f^l^'^'Vl^i^:;^^ Uingwistic computing Bulletin- 
„*rr'2"iur29l33 'cOOEN: .nc-b 

p^fcfssing of n.tur.l ,.ngu.O. text,. 

S^ll^ljj ,n.,.«...«-o" 1--VO, 

6(6) 566 

7405529 740S529 _ 

Cbcbi: A «6r<S count and concordance 9;"«''»^°'" ^ r n 
BOOK_AUTHdR: B.rry-Roggha . G: L. M.. Crawford. T. 0. 
GafDO#rini^ Spartaco ^ - 

0 con Cardiff CF1 XL Walers United ^^"0^°"*, . COOEN- 
Language and Style- 1974, 7. 2. Spr . 146-148. COOEN. 

igns-a — 
Series: REVIEW 



7615335 76-3-000551 
OBservations on Context Free Parsing 

Itatist^cat* Methods m Linguistics. Stockholm. 1976. 

71-109 - _ , 

Ooc Type: journal article ^« 

oSScriptorir.. linguist ICS - ^ ^ ! : , gen.ral 

linguistics, computational -mathematical models 
Oascrlptor Codes: 0302020001 

7127990 78-3-000651 ^- * n-*«** 

_ rjirt lit -Parsing llgor^ithm for Natural Language Text Osing 
a Simple Grammar for Srguments 

AMoc'it^on'^or^^Llterary and Linguistic Computing Bulletin. 
PL*CE ONKNOWN. __1?7B^ 6ll70-76 

Doc Type: Journal article _ ^ _ . ^ 

Descriptors: linguistics. - linguistics. general 
T ingulit ICS. computet tooal _-_mechanol ingui sties 

Oescriptor Codes: 0M202OOO3 



7600434 _ 7600434 , . - 

Publishing Computer OutPut of Processed Natural Language 

Texts-I _; 

Last, R: _ 

German U of Hull, England . r, 

Association for Literary ar- i inoul »t 1 cComput ing Bulletin- 
1973. 1. 3. Michaelmas. 5-7 .N: allc-b 
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PASP: _%omm Views on iu< ^atad Syntactical Parting of Larga 
canouags -Corposas 

Boot. 

R1 JksunWarsl tait . Utracht Tha Natharlanas_ — — 

XTL. Ravjaw of Sppi lad Llngulitlct- 1974. 23, 23-38. 
cbOENl itlg-a^_ 

lDatituta_ of Appllad Llngulatlci. BHjda Inkomatstr. 21. 

3000 toovaln, Balgium _ 

Saction Haading Codas t 065 _ 

ft discuiaion of sowa systaw- analysis problaws^ Tha 

problami. nayar wathamat J^ca 1 1 y dafinad. concaro tha syntactic 
parsing of larga languaga corjabra riot art i f lcl^ rastrictad 
1PASP)^_ _Airaady dayalopad stratagjas for PaSP ara discussad. 

ft a iiiora COi«pl« ta s tratagy 1 s proposad . ••a jpr charactar 1 s t Ics 

of this stratagy ara: ( 1 ) : _tha_ad_hoc charactar of spwa parts 
of It; (2) usa of a llnaar string grarwiar ; (3) daflnltlon of 

probability ruias: (4) translation Of prOtoaOl 1 1 ty rulas Into a 
priori rulas for String granimaf ; (5) contaxt sansltlvlty; ft 
(6)_f lixlblli^y of tha aystam. H* _ _ ^ 

Oascrlptors: _ COMPUTAtldNAL LINGUISTICS; DESCRIPTIVE 
tlNGUlSTICS:- THEORETICAL LINGUISTICS; SyNTAX; GRAMWATICAL 
ANALySlS4 OATA PROCESSING AND RETRIJEVAL; MATHEMATICS: CONTEXT 
SENSITIVE GRAMIiAR; TRANSFORMATION RULES _ 

Idantifiars: autdmatad Syntactical parsing _ln systam 
•nilya<» of PASP; Hnaar string grammar, contaxt sansltlvlty. 
probab 1 1 i ty rul as ; 

49-11199 DOC year: 1973_VPL_Nd: 49 ABSTRACT NO: 11199 
Modals for automatic translations. 

VauQuo^s. Barnard - c-<«— ^« 

N*t?onal Cnt.r for SdSntlflc """"f ' ^ | ' 9,34) 
MBth.n,attqu«s -t SciinCM Humain.s 1971 Sum Vol. 9(34) 

snalvs^s- (P) a modal for syntactic analys i si_and I c ) a , . 

MoIpSolSgV <LANGUAGE)._SVNTAX;_2B450. 1^^^ "S^^^.l lc . 

IfsibEX PHRASE: computar. ..^"^^^^^^ ^ 

syntactic ft highar laval surfaca syntax analysas mOdals 



7603975 7603975 _ 

Linguistic Data Procassing and ALLC Activitias in (Sarwany 

Landars. 9. - „ it^r>.. 

Inst It Commuhl cation _ Theory Ras & Phonatics U Bonn. 53Cr 

L laP'rauanwag 3 Fadaral Rapub; 1C of Garmany _ — 

.Association for Litarary and Linguistic computing Bullat.n- 
1974 2, 1. 24-27. CQOEN : _al 1 C-P 

6 Savanoaks Ava.. Haaton Moor. Stockport. Chashi ra SK4 4Att. 

EngJ^ahd -- — 

Sactiori Haadlhg Codas: 060 _ 

(Pt*asantad at tha Sssbclatidh for L 1 tarary__ and Llncj-jistic 
Computing _( ALLC) Intarnatlbh Maatlhg. 1973.) Sclantiflc 
rasaarch m tha_fi«Ic3 of 1 i tarary ft linguistic data procassing 
has baan intansif iad _in _tha last faw yaars In Ciirwany. 
Spaclallsts in taxt-orl antad data _proca»s Ing hava mat with 
spacialists cbhcarnaa primarily wtth_ tha _al«bpratlon of naw 
mathods of taxt analysis. Various pro jftct» ara baing carrlad 
^t it_tha_unlyarsi tias of Saarbruckan. MarDurg. _ Bonn^ ft ^at 

tha Instlt for Garman Lahguaga at Mannrialm ft Bonn. Tha 

projacts cdncarh natural lahguaga cdiiiituh^catl^ori batwaan man ft 
ccMnpu tar. oyh tact jc analysis, fiachlna trahilatlbh. Statistics 

ft sty 1 is tic _anajy3js . •utOfMit 1c l*n9^*9« _ ^■''?®9^*P^yj^ 

autOMt lc_ laxicpgraphy . ?§orphol ogyj^ syntactica1_anBlys is^ _ P^^ 
nwthods In styiisticft mathamatical JinguiaticSj^ naw taxtuai 
aditlhg tachniquas. ft cofNputar translation. Tha_ ALLC _has. sat 
up ragibhal branchas ft liiprovaa information .sharing _among 
diffarant projacts^ Tha Spaclallst Group for Madlaval Carman 
taxts ha8_a_l80 ^ntanajflad ita activitias. Burkanroad- - 

OaSCriptors: OATa PROCESSING AND REtRiEVAL; EXPERIMENTAL 
OATA HANOtlNG: SYNTAXi MACHINI TRANSLAtlPN 

laantifiars! linguistic data procassing. Garmany: 



7«02i«i 7W2MB^ SQctlon 5 

CoMPUtvr trans 1 at ibh with Pairad QraiiMiara 

Gra«n. R. G. 

Sh«f^lald SiO_2TN_InQjand_ 
_8aha¥lor_Ra8aarcbHathods and Inatrumantat ion- 1975. 7. 6. 
No». 557-562. CODEN: brml-a _. 

Thif Plyehonemic Sociaty , J 108 W. 34th St:; Austin TX 78705 

SactJ^bn Haadihg Codas: 065 
_ 11^ cartaifi typas of axpar Iniahts , _tha S cbntrdls an on- 11 ha 
cowiputar by _fllyjng .commanda in a •jmpla sourca langpuaga 

possibly a subsat__of __Engl ish or a _hjgh iayai_ cqmputar 

langaaga. T ha commands must than ba dacodad bafora thay can 
M eeayad: in 1 mathoa an. ad hoc program is writ tan for tha 
ipaclflc purpbsa. fth aitarhatfva Is to writa a ganaral 

purpbsa translator to dacbda tha aburce lahguaga Ihtb a mora 

primltiva targat languaga. A stiitabla translator Is dascrlba^^ 
dr 1 van _pr1nclpally by "pa lrad" contaxt-fraa Or*'"??^r'_ of 

sourca targat langyagas _but also _abla_ to accommodata 

contaxt-sansi t 1 va ruias. Tha tachniqua usad couldba called 
pair ad -grammar translation: It Is basad on a contaxt-fraa 
phrasa-stnjctura with a top-down, laft- to-right parsing 
systam^ Backus-Naur form is u'Jiad for tha Orammar hbtatlbh^ 
Tha targat grammar ls_pal r ad with tha squrca gr*!"*'"*^ ^ych a 
way_ _ that _ ayary__non-tarmina'l symbol i n tha sourca_grammar j^a 

assoclatad withtha sama non-tarmlnal symbol In __ tha __ targat 

which. by daf inltlon. Is Its translation. Tha mat hod Is 
ilmpla; cbhtax t -sahs 1 t 1 v 1 ty IS handlad by special -purposa 
iubrqutjnas writ tan as haadad^ Wi th tha programm ihg madlum. 
lt_is_assumad that tha_ languaga usad has fac 

prpcassing^ __racurs1on, _ & raprasantation ofstrjngs. lf_a 
languaga is not aval labia. FQRTRAN_wou 1 d ba adaguata._ Using 
tha translator has savarsi advantagas.- It is obviously much 
iisiar tb wr^ta ah ad hoc racoghizar for a vary primitive 
'■"9V*9?_ than for a subsat of Engl Ish^ sma\ 1 

l«f?OV*«Of» it is vary aasy to writa * chack grammars: minbr 
wpdl f ications__ara_ a _ trly la ijob . _& tha^finjahad product is 

unlikaly to contain hiddah bugs^ An_»xampla_is _giyan _tefhich 

takas Into cons idarat ion tha problam of transiating a_strlng 
of commands, soma of tham cOndl t lOnal . out of a languaga that 
usas has tad conditionals S into a lahguaga that Usas jumps to 
labals. _Mqdif iad HA 

Descriptors : COMPUTATIONAL L I NGUI St ICS : MACHINE TRANSLATION; 
C0NTE;<T FREE GRAMMAR: CONTEXT_SENSITIVE GR 

Idantlfiars: computer translation with .paired grammars: 
context-free phrase structure. Backus-Naur form notation; 



7704196 _ 7704196 _ « ^ 

' The Use of the Computer in Linguistic a'v3 Literary Research 

Pester, A. R. _ _ 

The Polytechnic. Wol verha^npton England ! ILY 
Association for Literary and Linguistic C'^r^putlng ti^natln-. 
1976. 4. 3. 245-250. CODEN: allc? 



7930714 79-3-0OCS54 
Knowledge-Based Parsing 

Gershman, Anatoie_vitall 

Dissertation .Abstracts. Internat loh^ S , p-> A US ISiiN 
0419-4209. Pt. B US ISSN 0410-^217. Arbo: Mt. 197^^ 

40:l275JB ' * 

Doc Type: Journal article 

Descriptors: linguistics - 1 tnyu ' =it ics . (irvnarai 
^'"9Mi«tlcs. computational - mechanc- 1 1 rv; * s 1 1 c : ''automated 
Analysis 

Descriptor Codes: 0302020003 



ERLC 



65 



770«J^96 770^196 

Thf Usa of ity Computar in Linguistic and L 1 tarary Rasaarch 
^•star, »^ 

Tha_Pp1 ytachnic^ Wolyarhampton England WVI ILY 
_ .Assoc 1 an on _ f or_t 1 tarar^. and 1 1 nga 1 s 1 1 c Compot 1 ng Bu 1 1 at 1 h- 
1976^ 4. 3. 245-250. CODEN: allc-B 

6 Savahbalcs Ava . . Haatoh Moor. Stockport, Chashira SK4 4AW. 
Engl and 

Sactlon Raiding Codasi 4110 

.Contributions to tha Ppyrth Xntarnst lonal _Syrnpps 1 um_of tha 

Association for uitarary and Linguistic. Conputing (Oxford, 
England 5-9 April. 1976) araraviawad. Briafly dascrl&ad ara 
tha sailant issuas of aach of tha 43 papars givah. Thasa 
ralata to currant work <nj^ authorship stud las 8t^ Hst 1 C8 . 

c A us tar analysis. cbncprdancas , spf twara, trans l-j/t ara t ion^ 

■ynisctlc_ analysis . taxt__ adi ting^ _ thanatlc ^analysis. & 

photocomposition. Tha 1 itarary basas of tha contributions 
ranga froni aarly Graak 4 Habraic taxts to Brail la, riibdarn 
Frahch poatry. S-diaiacts of Uppar Michigan, aa 

Dascriptbrs: APPLIED LINGUISTICS; COMPUTATIONAL LINGUISTICS : 
SYNTAX: ADOLESCENT LANGUAGE: READING AIDS FOR THE BLIND: 
FRENCH: POETRY; DIALECTOLOGY; STYLISTICS: STATISTICAL ANALYSIS 
or STYLE; EXPERIMENTAL DATA HANDLING: RESEARCH DESIGN AND 
INSTRUMENTATION 

Idantifiars: computar usa in 1 i ngu i st i c/ 1 i tarary rasaarch; 



ED0367B3_ ALOp2p€2 

Appl lad_Con>PMtat ional L ingui st ics . 

Hays. Odvid G._ __ 

Sep 69 - 19p.: Papar _dal Iverad at__tha Intarnat ipnal 

Confer ahcu CongrasS- of Applied Linguistics. Cambridge, 

England^ Saptamtoar 1969 - 

EbRS Price M.F-$0.76 HC-$1.5S PLUS POSTAGE 
__Much wprk ln\<^omputatlona1 1 inguistjcs^ e. jj^ the prepariition 
of _ _ ^oncordanceir _and text files. _ has dealt strictly with the 
surface of language.. .treating it as_npthlngrepre than strings 
of characters or phonemes . The "classical" scheme^ deyelpped 
as a result of di ili|it 1 af act 1 on w 1 th the inability Of_ swch 
surface systems to deal with problems Such as ambiguity, 
consists of surface processing, .syntactic processing and 
semantic Process ing^ with the object of bbtalhi.ng ah 
expression for the content of the input text: work with 

programming *y*^**^*_ _9*r'^r*^l^^ sentences with 

transformational representative of this tradition. 

It must be recpgnixeiJ. hpweyerj^ that the assent ial 

Character 1st ic_Of -language is_its_connectipn_wi th _ inf prmatlpn 
and that language is the e^;ternal man 1 f estat 1 on of _ the_ human 
capacity to process symbols insach ways that information is 
retained. This capacity should be the object df linguistics, 
and rules of gramiiiar should describe those "act ion patterns'* 
which underlie human symbol prbce^ssing. Recant work in applied 

cpmpuiatipnai 1 inouiat ics rfcpgyrizes the impprtance of this 

conception and shouid therefore: lead _to_ wider computer 

spplicatiohs. perhaps even. to real man-machine conversations 
and the cbhcbmitant use of the coii0uter as an imaginative 
consultant for a wide range of problems .( FWB ) 

Descriptors: Analog Computers/ *App1 led Linguistics/ 
•Cpmmunicatipn_ (Though lea/ 
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ia algoritha for assigning parts of speech from sorpbology; an 
algoritha for iutoiatic syntactic analysis; an experiment in 
construction of a • structure dictionary' for extracting purposes; 
experiients in using frequency and/or syntactic criteria for indexing 
and extracting purposes; devaldpnent of word goTernaent tables as the 
basis of a semantic component of an automated text analysis system. 

CESCBiPTOaS: (ssubject indexing, ftutoiatic) , (*Computatioaal 
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(SMART) project. The first paper: •Content Analysis in 
I hf brmat 1 on Ret r 1 eva 1 ' by S . f . W« > ss present s the reSu 1 1 a - of 
ixpirlmehts aimed at determining the conditions under -hich 
content analysis improves retrieval results as -en as the 
degree of improvement bbtSlned. The second PSPer : The 
^General ity Effict and the Retrieval Evaluation for Largir 
collections- by Salton assesses the rbie ef the generality 
effect IB retrieval system evaluation and g i ves _ ^ eva oa t 1 on 
results for the comparisons of several document collections of 
cn««1hct size and general ity in the areas bf aocumentatlon and 
iSrl^^amTcr ^n°the tnird pip*r : -Automat lc inde^lhg Using 
1161 iSgriphIc Citations- by G, Saltpn citations are u«d 
dlrectlv tb l dent If y aocantent content aneJ an attempt is made 
io eviluati their effectiveness In a retrieval environment. 
The flhil papir: -Automatic Resolution of Ameigyitles from 
Natural taniuage Text- by S. F. Miss discusses^ tbt 
svbiutionary process by -r,ich ar»bigui ties are ^^•"•^^•"f 
elaiilfies ameiguities into three cl awes: true. contextual 
ind syntictlc. (FOr th. entire SMART prolect n^^^L V,^! 
719 for pirts 2-5 see LI 002 721 thcougts LI 002 ^24 ? (Nh) 
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Currantaf f prt»_to taka adyantaga of tha «pacial virtuas of 
tna coifipatar a« an aid in_taxt_analysli ara dascribad. 
constructs. catagory construction. __and_cpntlngancy_ analysis 
ara •dlscusla'd' arid lllustratad. Machanlcal _tachniquas_f or 
raducing Human labor whan study ihg large quantities of verbal 
data .have b«*n sought at an increasing rate by rasaarchars lo 
tha bahavloral sc lances. _wriatever the purpose of 1^ 
It 11 to have a sclantlfic_charactar. tj must invoWa an 
attempt to radoca natural language d*ia^ _by f prmai_ry tas^ to 
measures reflecting theoratlcally relevant properties pf_the 
text. Its source, or its audienca affects. At tha .present 
tlwa^ there is no one theory or method dominating tha field of 
natural language analysis. Although much work is currently 

being expanded to implement a finite set of rules oh the 

computer, little has baen accomplished that is directly useful 
to researchers in th® social sc tences . t Author <^CK ) 
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•iiHs^c- ^ ''ns-^tu-.a fcr ftpF^^i^^J Techrclcgy , _ _Nat3 cnal Bureau ot 
^^a^li-^. Ctal^llraficn is .first giv^n_ tc__:.^d.raa cc.r^lad Jy or wzth 
aircf iacBinas. .ii^cluding citation, lna«^^- '"l"^?;'^ '^1''^^'' 
'listing is exe.Flifi^a ty Rey*«r t d-ir-ccnt ^::t J^UIC) and otho 
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^Isii'cr^fftr-s in :9uch ar-a3 as attcoan^c _ classification and 
:^I".^o-''2at±cn^ ccarut-i use cf th-saui^, statistical associaticr. 
:^;hr?^^at a^d linaui^^ic data Frccs^s^ng are described . A.majcr 
qaection is ^t5at of i^.luatioh. Farticul.rly m . vie._ of _|videnca c 
h^^li ii-er-i^ilxer inccnsist^ncy . It in c:.ncludad^ that indexes tased 
cn -ot"^%«^rLt5d frco ^^axt are f^actical. fcr . «ny purposes tod^y, 
and tha? autcdiatic: assignment indexitg ard classif icaticn eafer.m.nts 
Jhoi premise for future progress* 

r*.5r-.^c-c-s- sAutomatic indexing. Indexes (recurrent aticn) , 

Cc«^a^at!c?il. linguistics. Machine tran.latZLon . Subject index ter.s, 
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Abstract: A pravicus ropcrt iAE-fl021 537) dsfined a series of 1U ncv?l 
as*£uras fcr dsteraining tfcs conpre tsnsitility of English text on th° 
tdsis 3* current psychoiincuistic atd St ructuri-cf-Itt^llect ctiented 
ccGcan^s. That rapcrt not 'only sug^e^ted th€_FCtential_ useful 
tha i^asures, but alio coFjl^^^^^^ this feasxbility ct autcffat:.nq tli€ 
caiculatacn cf these a^asures. The jtesert repcrt tak«s the z^xz 
logical stefs in iapleaenting these a9asur€s_for ccmputer application, 
^irs*-, th^sc ajeasures art analytically defined and d-^scribed. Than^ 
i^lec^ei iraasuras arf subjected tc • labcratc.r i • sxgf riment al 
investigation using Air Force. Saniials, Career, .Eev^lcpoent Course 
materials, an3 tJSAF iechnical Orders as sattFl^ 1 2xts . FHSuits cf tiieSz 
experiments are presented. An autcaatic calcuiat acr irethcd is tn^n 
Javslcpsa fcr each cf the 13 selacted aeasures. Th? struct ur? C:f--trti^ 
c-ccraffBing seecif icaticns is nodular and is int^nard -tc" calculate ths 
iiaiures for variable ii2c biccks cf texts. Flew charts and' sumrnary 
3*3criDticns of the prcqram attributes are alsp_pres|r.ted^ togathar 
sith expianatichs of run recuest syntax, saaf le aeasurss caicuicjticns, 
and 6utnut fcrmats. This re pert ■ than ccnstit ut9?_a ccff pletf . def miti or 
cf the 'piogram suitable fcr future irrlescntatxcS cn an autooatic Jata 
piccessing- systco. 

C€«criFtcrs: •Psycholinguistics, ^^eadingi, *IntilligibiJ,it y , 

-*-nfdra2-icn processing^ Ccaputer prcgratnfliiag , .^Cqmpotaticnax 
linguistics, English languace, Instr ucticr aacaals, Ccurses (Educat icn) 
, l5xt processmt* Cpagrehensicn, Pfasorement , Sirtax, Semantics, 
Asssssir3nt, Ccmputar prcgraas. Flew charting ^ 

r * ^ 
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raaair« compL-ta Syntactic _aniIy«l^ o eoc«put-r -nd 

cbiiipatar Tha l^ndlvldual words or .» imlnatad^ 
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racognlzSd. T^,* ^^^Pyt s a H.t ^^^^^''^.^^ for;^ human 

a scraanad a«capt ion_ !l t.rms .1 H antar an 
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Ihtagr.tad Languaga Data^Bas.^^^ ^^^^^ synonyms to 

tarms a«ractly tp ^l^" unracogoJii^^ «rm» for tachnlc.L 

irni:5i-..r.r .^h^.Ul^iy"..-.^ inat ^ ..P^ura ,P, ,ow. an 
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to leet tke H^'"'^!, ^'acS^^ I7b ^70. i prograanied analyier is 
described in ^'s 170 ^69 and j^*i|olc-ap procedure and a 

presented *hich\ •»Pi°I^* ^ ?S?dict±onIry contains less 

loateiMensitiT^ coiputational gram»*r. Tne^|| ^ heiiristically 

llann^ee bandred . ««BCto| word IjJ^jf i^|^|||y*|n5 language similar 

developed grammar i« * r"*««ar±«s to all 'words in an input 

Jo loIlT. fh3 analfxer.assigns cat|gori|| to aii^ .^^ phrases. 

te.t and i^^^^^i" .'*°S|i'oaSr^ll^ Irl rl^lac^^ by antecedents, 
ielatire Pronouns and the^pro^^^n * to syntactic analysis rs 

It is shown that this co-P^tationai approach^to ^j^^^^ reguire 

cconoiicall> feasible «°%„"||"to!eratt -inlr etrors . Th*. economy 
.iniial syntactic "Jlniririts liSitH dictionary, relatiTeiy small 
S^mb^r of n|-pui|?li^li'"5ltrait restriction to technical English. 

(Xuthor) - 

, ^. . . i-a+T+A**!* Sttbiect indexingLf i 

irseaiSTOJS: 'tf^'SHJi^ iiniilUai t«rii**l). imgaisti", 

Ip25TiPiSSS: tEcoa ^1 
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Engineering. 
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nneuiMnt NcEl Available from tORS- - ^_ 

f-iniiif icatlon tibles for an ar.a of specialization. Thes^ 
tie" quirinab.i thS construction 5^^P^«*'"|^ 'l""""*^^ 
ITS. uith far less Intellectual effort than now required, but 
:;n, rStl'n '•"consensus of S.p.rt^ ^opinion through the 

' 's:i-^^p:o?;r="2u^^m:t^onr:^c"^^;^^citi.n/ c^^^ _ 

concordance./ ^^^'^^V-^i,,;,:!^^' S^lr^r lu?'// 
•Informetion Systems/ •Semantics/ benxenc.*/ 

•Vocabulary/ Word Lists 

rvaJaaticn cf Autcaated Fatural tanpaie Frocsssirg in. the Furth«.r 
SevelCFiant cf Scisnce Icf orsaticn B€triS«ai 

v.B Tc-k Uriv., M.Y. tirguisTic £*rirg Projf c*. . -Kational Sciar.ca 
flun.laticn, a;3.-,lag^. = n, E.C. Eiv. cf Scierc. infcr«aticn. 
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^iapt no: String Prsgram-IC 
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snst-acf Iha retort describss advsncss in coaFi3tari2--d natural 
iancalgV prcc.siing ^SLE) and relates then. ^° ^^l^l^J^t^^i 
^anPicns of mfcraaricn systems. Section 1 = s '^^^4°*-; = ^^^^ r" 

if- i-nfc-B^*^c2 fiald «hich have lid tc a ren6«6d_anter = £t in^rLF, ana 
hci StP prograis couH us€d tc prcvxde aa.. xnf crmaticr. 
fl-JJciJ cl^ating ^cn^ natural language data fcas<^s. It d^scribas th = 
ts^'c for luis F?ograis it tbc inherent rslaticn betseen infcraaticn 
«--uctU-i. Section 2 d€scrib^a th^ stagss cf grccessmg 
fntch "a"' larg^l" unrestricted natural laaguag. ircut cf the type 
*r«u-*^-^a in sli-^tific cciBunicaticts intc data structures suitaoi-. 
io- Sd7"anc^d -ypis of infcriatich F=«"3ing. Section 3 iascribes 3 
"«ly aJ;eicFed''ciust^rin| grcgraa fct ganeratxrainforiationally 
L'^-^ficaat scrd classes from docuirents in pa-iculai sub3sct are^s. 
is2;tcn U p'ii-nts exaiBFles and suggestions as tc Hob SLF 

!^-hn^lu*s lur-intly available or under a?velce»ent couid be applied 
l^^rlo-a.-icn'ivsJ-BS. Sfcticn 5 suggests directaor.s for furtn«r 
rlseaichirSL' as'a fiundatxcn fcr natural-language-tased infor»at. = r. 
systalrs in the futures 

c»«c-ictcri- icciputaticnai iingaistics, *:n2crffatich 

stJantiH: Syntll AUtc«ti= language processing, lata processing, 

lachsical writing, TraastcrBational granitars, Clustarirg 
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_ Anana lysis of_a iarga salaction of Carlyla's prosa was dona 
Py maans of a linguistic 6 quant 1 tat i s/a inathoa- of syntactic 
analysis ft > r^^utarizas parsing prbcadura. Tha study had 2 
oejactlvai- CO ^dahtjfy stylistically >lO!^l^Jcant alimints of 

^f 'y]*/^ "^^y^tsx ft to datarmina _tha prof 1 tabillty_ of 

l#rjj*'5*'««^ ** '^Jtpiiiat lc syntactic analysis in dascribing _prosa 
itylr,_ _lr'5__ln_itlal_ syntactic analysis was paef ormad-Py a 
compv^* ; ^ parsingrout ma davel opad Dv 0. C. Clarka ft R: 
t Massas of quant l tat iva information about syntactic 

faaturas wa-a afialyzad with statist icat mathods of cbfupar i son 
ft corralatlbh. Thaisa quant 1 tat lya stylistic f aaturas ware 
P\^'^^^*^P ^" _coniunctlon w1th_ clpsa_ critical analysis of 
?P«clfiad passagas^ Tha__stylistic habjts known to ba 
pacu Marly Carlylavn _ _ oar iod Jc 1 ty . accumulation. ft 

irragularity war* all r-avb^lad by tha itudy. A growing 

tapdahcy to omit Impbrta'^^* syht^ct 1c •'•^•nts or to intrbducw 

^''''•0"' '.^ L^t jas into sivar^ia cr syntax was _notad_ _ i n tha 

cbrPDPlPSIlcal dayaippmMnt of f\s :»^y\m^ _C:_»riyla_stratchad tha 
capacit_las_pf_gng^l»^ syntax " Tjt his ow" naads. ThJs is 
tha broadast-basad study of it-i - ;na so far attamptad. ft tha 
stylfsttc faaturas dlscovarad apply mora ganarally than 
aarHar studias basad bn smallar more 

carafulty salactad passages. S. Karganovic 

Oascrlptorsi ^tYLlStlCS: STAtlStlCAL ANALYSIS DF STYLE; 
SvWTAXi LiTERAttV GENRESi DATA PROCESSING AND RETRIEVAL 

Idantfr>arsi _ quant i tat 1 va compatar analysis of syntax in 
prosa styi«; Carlyla: 
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Abstract: This voluse is t6e third in a series of fletailed reports bn 
. a vorkiDg cospater prdgraa for string decoiposition of sentences. 
This volume contains outputs obtained by the progran for five short 
scientific texts. Each successive sentence of the text to be analyzed 
is entered into the conputer without pre-editingi The program looks 
Tip each word of the sentence in a grammatical dictionary which gives 
for each word all its grammatical classifications. without reference ro 
the way the word is used in the given article. The program then 
decomposes each sentence into a very short elementary, sentence which 
is the graamati.cal center of _the_original# plus various strings of 
words: each string has a fixed grammatical structure, and a rhe 
elementary sentence or one of its adjoined strings. (Author) 

DESesi^PTOHS: (*Cdmputat ional linguisticSr Programming (Computers) ) # 
Dictionaries r SeportSr Analysis, English language^ Gramiars 

IDSNTIFIEFS: Strings (Linguistics) , Parsing^ Sentences^ Computer 
analysis 
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Qocumant ^t Availabta from EDRS. 

Tha_daya1opmant_of automatic < ^^^'J* < "0 • _ _ abstracting. and 

attracting systams_ is _lnvast igatad. ' .^••cr jbas tha 

aavalopmant of tools for mak ingj* syntactic, and '•'"■ntic 

distinctions of potantial usa m automatic _lndaxlng and 

ixtractlng. Oha of thasa tools is a program for .syntactic 
analysis 1 1 . a^ . parsing) of English. tha-OthaP- 1 a a dictionary 
Of .English _wprd govarnmant patterns. Psrt II rapOrts on tha 
rasaarch_ .program In baser lb ing and abstracting plctoriaT^ 
S tractor as. This work ls_cpncarnad_wlth wha tha r 1 t^l^ 
to construct a symbolic raprasantation pf_a gray_Vayal picture 
which can provide .essentiaHy the same information as the 
Picture itsel^f. Based on a series of experiments using human 
subiects .describing aerial^ terrain phbtegraphs. it was 
possible to make certain obseryat ions cbnceming deductive and 
matadescrlptive aspects of dascriptipn. I.a,. the "set." 
contextual knowledge. and certainty of the subject. 
( Author /NH) 

Descriptors: -AbstrBcts/ -Automattc Indexing/ •Automation/ 
pocumentatlpn/ Exberi mental Programs/ •Information Processing/ 
• l.->f cr'wat ion Systems/ L 1 ngu 1?t ics/ Syntax 

laant 1 f iers : •Automatic Abstracting 

Develops ^at Of tacguage Analysis procedures 7ith Ipplication to 
Autoaatic Icdexing 

Ohio State Oni?.,^ ColaiibTis. Computer and Information Science P.esearch 

"""""clrtw;- - 

lOTHOH: Iduag, Carol Elizabeth 

C2321I2 . PXp: SGf 88A* USGHDS74G6 

Ipr 73 3ibp« 

B2PT HO: 0SU-CiS5C-rB-73-2 

SRiNT: HSF-i3N-53a,1 

HOHITOH; 18 

iBSTSACT: The paper presents (1) a theoretical fraieWpfJe within which 
relationships aaong words are defined and 12) algorithms which have 
been developed to identify these relationships. ThealgQrithss which 
have been developed. effect four processes: the rjsignaent of each word 
t6_ a graaaatical class^ the identification of phrases and of clauses, 
and the assicrntient of case graaaar roiesi These linguistic analysis 
procedures are be used to construct graphical represent tion? of 

sentences. The griip>s are proposed as the basis of a generalized 
indexing systeii portions of xhis dbcuaeht are not fully legible. 

DZSCalPTOBS: •Autoiatic indexing, __ *PhraBe structure gramiars, 

eCoaput&tipnal linguistics, ^Syntax, Words (Language) , Seaantics, 

Schematic diagraas, 2nglish language 

IDEHTIFISRS: NSFSIS 

P3-227 C88/2 STIS Prices: PCf7. 25/arS1 . U5 
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Ehll0048 IR602327 . _ _ . 

Sh *n»lysis_ of l4«thods for Preparing a LargP Natural 

Lanpvjag« Data Basa 

Southeast Ragibhal LaDoratory _f or Educational Rasaarch and 
Davalopmant. Loa Alamitbs. Calif. 

16 Fab 7 1 2?p. _ 

Baport No.: SWRL-TM-5-7 1-d2 ^^^^^^^ 

EDRS Prica MF -S0.76_HC-$ 1 ,95_PLyS POSTiGE ^ - 

Ralatlva cost and af f actlyanass_of tachmquas for prapsMng 
m coniputar cbmpatieia data basa _cons 1 sting of approximataly 
ona million words of natural :anguaga araoutl inad__Consldarad 
• ra dollar, cost^ iaaa of aditlng. and tima consui^ptipn^ 
Facility for insartlon of Idantifying 1 nformat Ion w 1 thin tha 
toxt a?-d updating of a taxt by warging with anothar taxt^ara 
^4.,*.:, «tt«ntlon. It IS concludad that Wagnatic Tapa 



paripnarai u5as_iipf:_ir?« •mv'*""t"»" ^' ' - • 

dadsldh ara dlscussad,_CAuthor ) - _ ^ o..../ 

d*scrlptdrs Computars/ •CostEffactivanass/ •Data Basas/ 
Data Procassing/ ElactrOnlc Data Prpcass 1 ng/ •Equipmant/ 
?in?ormat1on%rociss1n^ inf ormat Ion Storaga/ -Input Output 
Davicas/ Man.MachiHa Systams/ Offica Wachlnas/ On Lina Systams 
/ Optical Scannars/ Typawrlting . 

Idahtlfiars: Administrative Tarminal Systam/ ATS/ Cathode 
Ray Tuba T.rminals/ CRT/ Dataplax/ Flaxowritar/ Kaypunchas/ 
Magnatic Tapa Salactric Typawrltar/ MTST/ Optical Character 
Scanrring/ Talarypas 



r^utomatad Natural Languag. Processing in the 
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ED145829 IR00524D 

Evaiuat ion _of _ 
Further Oevelopmi 
Program Reports No. 10. 

r^;«or1ng'Aaincy: Ni.t,o«S1 Sci.nc. Founc^.tion. W..mngton. 
D.C. Div: of Science Information. ^ 
draht NO.: GN39a7? _ _ _ _ - 

these goals.- An pyarvi»w r«n«wed interest in 

informition fl.ld which ^|v. *° 1" programs for 

natural 1«nguig- J»'-°""lC2ae to Tulfni 1 anSu.g.-bis.d 
processing , Ind thi relationship bet-.en 

functions of information SySt.ms.^aneJ ^l^^-^^'*^^^ processing 

information '^n?ut of Scientific 

m^:^. In-rS^n .tri^^:r .U.^. information 
p r pees a 1 ng- -parsing^ structural transf ormatibhs of parse 
outputs, and arriving at anundarlylng s«mant icaj 1y •••■hlhfilfyl 

rfpresentat ion--are out I ined. The report also ..describes 

research reiatea to the compaterizad.dl scoyery of .semantic 
structurei in icience Subfields- this research is concernad 
«i?h _ the problem of structuring a data base which is given in 
«Stural_ language^ Examples and suggest ibns for the appHcation 
of -technigyes currently avails under develbpmeht to 

Inf ormat lonproblems . __and suggestions for further research in 
the language area of information science are presented. 

(Authdr/KP) _ _ ^ 

Descriptbrs: Artificial Intelligence/. •Automatic Indexing/ 
^Computational Linguistics/ Evjiluatlon/ Information Processing 
/ •information Retrieval/ Information Systems/ Language 
Classification/ l«an IMichine Systems/ ^Science Materials/ 

•Semantics _ _ *^ 

Identifiers: •Natural Language Processing 
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pasttarch on Synbnyiy and intonymy: A ifbdii and Its Bepresentatibn 
Baryland 9ni7 College Park Coaputer Science Center (^03018) 
Technical rept._ 

iarecs: Zdaandson, H, ?. , Epstein, B. H. 

I4631L2 FtD: 5^, 56J 0SG1DR7215 
Bar 72 2 So 

BSPT Sdi __ __ 

C05T2ACT: NOCO 1 4-67- A-0239-OOOa 
PH0J2CTS NH-0ft9-261 

IBSTHACf: fbe paper describes a modified and extended version of an 

aacion systea tfaat constitutes a ■atheeatical aodei of synonyay and 
antonyny. It also oiatiines the data structures used in the computer 
representation bf_the model. The ihtent.of this resiearch is to refine 
an axiomatic model previously proposed to better reflect the lat«^nt 
structure of synotiym dictionaries and to imfluence their future 

compilation. Particular attention is given to providing a convenient 

computer representation for testing the current set of 13 axioms,. The 

computer-based system provides an automated determination and 

verification of existing relations among dictionary entries and 
generates hew relations among words to be included insuch a 
dictionary, as veil as protiding a measure of the binding power among' 
related groups of vords. (Author) 

DZSCfilPTORS: J*Seaantics> aathematicai mbdeisj 9 _ (*Cbmputatibnal 
linguistics. Semantics) , Dictionaries, Data processing systems 

IDESTIFIERS: Synonymy, Antohymy 

AD-743 892 NTIS Prices: PC$3. CO/HFSO. 95 



PAHT-CF-SPSSCH IflPLICATIONS OF AFFIXES 

Lockheed Bissiles and Space Cb Palo Alto Calif (210110) 
AaTHOR: Early Lois L._ _ _ 

3295La FLD: 5G DSGRDH6711 
a Feb 66 7p 

HONJTOH: 18 

Besearch supported in part by ONB. 

Availability: Published in Hechahical Translation and Compuration 
Linguistics v9 n2 p38-43 Jun 1966. 

. ABSTBACT: The paper describes a systematic investigation X)f the extent 
to which the part of speiich of words can be_ identified from their 

prefixes and suffixes. The results indicate that it is possible to 

£otermine# with 95 per cent accuracy^ the inclusive part of speech of 
an affixed word from a consideration of its prefixes^ suffixes/ and 
length. By 'inclusive* parts of speech we mean a string that will 
include all of the parts of speech assigned by both dictionaries 
considered but that may include on twoextraneous parts of speech. 
The extra parts of .speech will differ afccording to the class of wprds# 
as adjectives may have an extra part-of-speech •noun* or 'adverb,' 
while nouns may have an extra part-of-speech 'verb.* The 
part-of- speech .implications of _ seventy-two prefixes and of 
eighty-seven suffixes are given. (Author) 

DSSCSIPTOBS: (^English language. Computational linguistics) , Grammars^ 
Classification^ Algorithms 
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^*itrtng Trin|?ori»itibn^ in th« ReqoMt Systam 

IBM BaSMrcft Dly. Yprkto«n flight. Ny^lOMS ^^^^^ 

thi Finiti String- 1974.^^-2. B ' * Arlington 

Cintar for *ppl 1W linguistics 16 I N K.n St. A^^^ 
VA 22209 (PubHshaa as part of_ tha ^juarican " 
Coi.iput.tionai Linguist Ici US of Tha Finlta String. 1974. Vol. 

1 U No . 1 ) _L- 

to bmrfniX transf ormat lonal oparat ions not on W^on t^m^ fuii 

^?lliif7sr construction,^ idiom handling. » th. suppress ion^of 
n-'-'nu^Sers'lJ u.w.nt^ surf. « pars... -hHe no ™...s^. 
panaca for transformational P'-^'i^B; -^""'i , "'t^e?,. rapid S 
^ransformaticn, -n ^SSf^-'^^^^-' P--»^^^«^ ^'n a nu.«b.r of 
f^^drtlnt'^lre-r-i .out* clr^iipond^ng'^dvers, i™P.ct on th. 
^T^ror;^. l"icon! th, co™pl,xity of th| surf.c. grammar. 6 

tr,i number o;.*>^^'«- |»§|SAT?S''"''iND «NERATiyE GRAMMAR , 

'■'I'^WAlr.. ..n.ni ,r.n,for..„o« .P ..Oo... Sy—.^ 
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0ser*s Guide to the SOLAR KiilC Pile 

System Developaent Corp Santa Jlonica Calif *r ^yanced Pesearch Prbjecrs 
Ar!oncy^ Arlington^ Va, (339900) 

Special technical rept. _ 
AUTHOR: Tiller^ Tinothy C* * Heath, FranR 
C5373A0 PLC: SB, 92D 0SGRDP7517 

JCaay 75 _ 23p 
HEPT NO: T!!-5292/C08/b0 

CONTSACT: DA HC1 5-73-C-OO 8C, ABPA Order'-225a 
flCNIXeH: 18 

ABSTSACT: The_dbcuoent_ contains a general explanation cf the K»IC file 
of SOLAR (a Semantically-Oriented Lexical Arch . It is intended as 
an in troduction and reference manual for the on-line user, the casual 
reader, or the data collector. 

DESCRIPTOHS: ^Senantics, *ircrds (Language) ^ Speech racbgnxtion, English 

language, InfbrHatioh retrieval. Data processing. Indexes, 

COD^utatidnal linguistics. Natural language, Ranuals 

IDENTIFIERS: K3I£ indexes^ SOLAH (Seianticaliy Oriented Lexical 

Archive) , Feoanticaliy oriented lexical archive, NTISDODA 

AD-A011 173/1ST WTIS Prices: PCS3. 25/SFS2 . 25 
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Th« •n?-Hi« 1 _ meeting of iirm ACL Sectioh 7 

Moyn* d: 

OuHhi< Coll City O >••« York Nv 1002 1 

CbiiiQutirs arid th* HQiMihiti*!- 1973. 7 (6), 4\^3-415. CODEN: 

cohti-a _ _ 

Oyaahs.Colltga PrMS. Flushing NY il367: 
Section Haading_ Codas i_ 060 

An outilna raport__Of _tha alayanth annu») m of the 

Association for Computational .Linguistics^ h«lcl_August_i and 

2, 1973 at tha OhivafSlty Of Michigan In Ann ArOor^ Research 
on spaach racbghitlbh and undarstanaing contlnuas to ba_ a 
topic pf_ major Intarast 1 h cofiputat 1 bha| Hrigu^st ics (CO 
•round _th#_cpuntry. _ ¥Qst of . tha epeech prbjacts era supportad 
fey ARPA _and ara intandad to cpmpl a«?nt _aach othar and^ dh 
tha ARPA network. Tha trad i tlpnal approach _to speech 

racbghitibn in the past was to "ely on enginaering 

diveibpmehts and filter ing devices i»* the segwantation of 
phonetjc elemants^ Tha trahd is towi^e more ra Hence on 

t inguistic enalysls and "uhdarstahdl hg- of an utterance: 

Papers wara__presentad_ which concern automat j^c P*''al"0 

Chinasa. an automatic retrieval _systamwith_naturaljang^ 
communication, atid a langiiage daveioPad for_communic:ation with 

cbmputar by hbhhuman primates. The four papers in thesyntax ^ 
sassion daait with a cbmputar model of Panini 's grammar, 
•ementic-di racted translat ion of cohtaxt-fraa languages^ the 
tasting_of_a_grammar_of English with ho Cycia, and aj^odei of 

a "pcrf ormanca" grammar_of Engl ish^ Tha four Papers <n tha 

lexical studies session wara concarned wi th_a»orPholc5glcal. 

syhtictic. and semantic ahalysaa in laxicography and 

construction of d ict 1 bhi^r 1 as . One paper _m this session 

repprtad_the_usa_of le>ciccstati^stlca| devices for arriving at 
ralat lonships among I ndp-Europaan languages . AA _ _ _ 

Descriptors: .COMPUTATIONAL LINGUISTICS: EXPERIMENTAL DATA 
HANDLING; DATA PROCESSING AND RETRIEVAL _ 1 

Idantifiers: computational 1 1 nga 1 St ICS ; Conf erence_repprt ; 
annual meeting of Assbciatlbn for Computational Linguistics: 

The BIHD Systea: K Data Structure for Seiantic Inforaation Processing 

Sana_Corp Santa Conica Calif {2966C0) 
AUTHOR: Shapiro^ Stuart Charles 

I33iat3 ?LD: 5G, 53, 9B, 56J, 88B, 62B, 70C OSGSDS7202 
ftug 71 172p* 

fiSP? HG: B-837-PB__ _ 

CGNTBACT: f aa620-67-C-00tt5 

iBS'^SAC'^' A a«scripti6n is given of the data structure used in the 
seaantic' file of the BIKD systea CManagcfflent of information through 
Na-urai Discourse).* and of the procedures fpr aanipulating xntoraati^cr. 
stored in- the file.. - The HlHD_sy5tem consists of nested and chained 
modules of high-level progranaing language statements; it_ 
'•elatively easy to modify, either for improvement or for adaptation to 
ipecialized apt^lications. the major features of the data structure 
are- it is a net whose nodes represent conceptual, entities and whose 
edges represent relations that hold between_entities: Some nodes 
-^he net are variables, and are used in constructing general statements 
and deduction rules; Each conceptual entity is represented by exactly 
on-* node in the net from which all information concerning that, entity 
^s^ retrievable: Nodes_can_be identified- and retrieved either_by name 
br by'a sufficient description of their cohnections.nth other nodes. 
The use of the system to experiment with various seiantic theories is 
demonstrated by examining several questions of current linguistic 
theory. (Autho:;) 

D*sicEIP'^03S: (•Semantics, •Data processing systems) , l^Informatibn 
r^tirievai. Command ♦ control systems) , Programming (Computers) , 
Computational linguistics^ Hanagement planning 

identifiers: HlSDISanagement *of Information through Satural Discourse). 
' ii»^^l«^T.i- of inforiatipn_ through natural discourse, ^^ir^.-ral 
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Automatic »yhtact|c ah«tyfl» 

ROOK aUtHORi Foatar. J. «. 

«oo<t. darick 

AppMM Hathamatics^ McMBftar U. 

Ihtarnat looal «Joycna1 pf Computar 
(2/3). 189-191: COOEN: Ijcm-a 

^w'Ybr^k?^''^i)^ar,cah ElaSviar. I970.fpr tha Unitad 
Gordon 6 Braich Scianca Pub! i sn^fs . 440 Park.Ava 
York. Y.__lO0ii; _and_for an othar coantc las 

Braach ^d^nca Publishars. 42 wniiamlVbt.. 

^'lictton Hiadlng Codas: 060 LANGUAGE : Engl . _ 

I fivorabli raviaw of a work which istha f<rst In formal 
i.nouaoa tr^Sory to daal solaiy w i th _au;^atic_ syntact ic 
answs?s xTls an introductory taxt. not a work.tha^ cov.rs 
ths arai of syntactic analysis axhaust i val y . It 
nSht that a r«dar ,hould_.pproach th^s_Bddk. J ] ji^l^<:'^^^^ 
«r?ttih with many •orkad ax.mp.l-s.that m.ka t a oy/ to ra.d 
Alio contributing to this •njoymantis lts__s1za ^^'^ ^ 

nagas) which maans that it can ba raad at_onf s i tt Ing 
TsnTcs covarid includa: (D cdntaxt-fraa grammarsi_ _ 2 ) 
;*?;ingrT3runwacs^l_P.rsi^^^ »P«|a P.rs ^9 

mathodi- (5> transformations on grammars: and y(6) using. 

7302244 7302244 ^ lng^^»^ 

Smln-a tfc*.4«#rtp P. 0. Box. 

Spra'kforlagat Skriptor. 

^^'lactidh Haading_Codas: 060 ^ d 1 L ( Rap i dly Extans i bl« 

r%i^cussion Of ^^^"•"•^".on^ '^'^^"^ 

Lanouaaa) English. ^-^•^^'^^^P^ °" ^.aturas casa structura. 

Lahgua_ga)_ transformational grammar f^^ .^corporat iPnof 

of prohduns. and P«r«^"0- inclusion of 

casa grammar 1» "aw 

ENGLISH: CASE 
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SYHTACTIC iSALISIS OF EHSLISH BI COHPUTEE: a SUBVZt 

Bolt, B^ranek and Sewian. Inc.. CaMbridge, Hassl (060 100) 

AUTHO?. : Bobrovi, Daniel_G*i 

dSOSGl - FLD: 5G USGHDE6602 

196a 23p 

Distribution: No linitation* 



ABSTRACT: _Th€ rerie 
classes siiong Engli 
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classifications of 
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syn t acti c __ i t ruct are 
gif en; reference is i 
and goals and present 
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rules for 

associated 
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success of 



with a survey of the deteriination of 
Host prograis dbing syntactic analysis 

Vbn to find possible 

aibig?ities during the 

of thosi theories of 

..w* ^ -^^^-^^ TocessiSg by coipute r . 
each graasar ahd a description of the 
with a sentencfe by each processor are 
ich have^ been written. 
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BESCBIPTORS: (^English language, Coaputaticnal 
Analysis) * SeaatticSi^ State-of-the-art 

Transf oraaxionai graaaars 

IDEHTiriEHS: words. Sentence, T^ee diagrams (tikiguistics) 
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'Ti''ti.-^Si;th P.r..r for r*c*.eitr.nt gr.r^.r, 
_lOt«rn«tion«l Journal ef Computar Pwxn-m-x 

St.j^ London BCa England: 

S«ct(qn Hfading CodBSl 060^ .-nami bimg liiad m a PL/I 

« .yntax-diractad P-'-'^^O •^^•'^'^ iiiSs a highly 

cospllir 'or ^« CDC 6600 .H,c""cy. 

riitrictea Or«>-wf of th? ?laM LL^ ^ tha granH«ar. Thasa 

iicipi hitch for th05. cast^ „akS Sat-sions without a 

caaas ir. hahaiaa By oraclas t^at can «aka a s^.jp;^H. tha 

fuU-scil. iyntactic ■"•'^»^^- of Syntax iqiftiens. wian^lc 
SYNtaxiDIractad P*R«ar. .conslst»^Pf^«yntax aq^^ consists of 

rout ina< . and tok.n class aaf initipns. ^^^^'4^,^'' ^^6, a* : Tha 
a P*RSE procadur. in ^L/I togathar wlth_ c,rt.^^ icahnBr . 

PiRSE Procaaura works in conjunction -ith^a fiihioh. 
^.ignSd to allov* look^.haad^Py through thS 

^.m\^<^':^''^^r^^'^^^^r^^^ for a'parsing 
machine. _ The instruction 5«t of thf parsing m«ch_in«_ls 
d«scr1b«d.; »nd_«n example of tha cbi*pilatlon of a syntax 
•Quation is gi van _; _ ^ « « 

DaSCriptofSi .COMPUTATIONAL. LINGUISTICS: _ SYNTAX; 
PROCESSING AND RETRIEVAL: GRAMMATICAL ANALYSIS 

Idantifiars: syntax-di ractad parsar: racaldtrant grammars: 
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Tha lexical subclassasof tha Lihguistic String Parsar 

Fitzpatnek. Ellaan; Sager . Naomi 

N«w Ybrfc 0 NY 10003 _ 

Amancan Journal of -Computational Linguistics- 1974, i. 
Microficha 2. 1-70. COOEN: ajcl-d _ _ ^ , 

Cartar for •rx>. Yd Linguistics. 1611 N: Kant_St.._ Arlington 
VA 22209 (If * Tr.a Pimta String as of 1974. Vol. 11. No. 
1) 

Sactiori H#i5. iir.g Codas: 062__ ■ , , eo t > 

The New York QhtvarSity Linguistic String Parser (LSP)s a 
working System for the syntactic analysis __of Engl ish 
sciantTfic texts. It consists of a pars i ng.program, a 
large-coverage Eng Hsh_ grammar . and a lexicon. . The grammar ? 
•ffictivenass _ln parsing, texts Js P"*;!^,^^^ 
suBstantisi body of detailad, wen -f ormedra«s restrictions 
which elimlnata most incorrect 5yntactlc_pars».5 wM^ IJ-^'l^^S! 
allowed by a weaker grammar. The restrict ions _mai nl y test for 

comoitibie cbmbinatibns of v ord subclasses, The 109 

acajectlve. n^un. and varb subclassas. as we H a« others not 
Bresented r>mrm are defined in such a way that they can_be 
Qs" a^ a g^de for classifying naw antriaa to th. LSP 1 ax i con 
ind ai a l?^istic referance.tool . Eich def 1 nit Ion ihc lades 
Htatament df the intent of the.subclass.^a diagnostic frame 

sentenci ixamplas. and a ^ '^^^'"''^ T""/"^*,^^^^^ 

dictionary. The subclasses ara defined toref lac| Precise ly 
the grammatical properties tested for by the restrict ipns^^f 
thS grammar. Where necissary /or clarify ihg^tha^intant or tha 
subclass, three addltlonal_criterla are employed: •^^'^o"; 
TrnpHclt and cdrafarance. and paraphrase. _ The_subcl assas hava 
bain dif ined so as to ba cons i stent _wlth a '^^w^'^V 
tr.nsformitional analysis which la currently being 

^"d«^?dT2;-S-"enGLISH; data processing and retrieval: SYNTAXj^ 
GRSSS«n«riN4LY5Is!_^^^ LANGUAGES: _ TRANSFORMATIONAL AND 

GENERATIVE GRAMMAR; COMPUTATIONAL LINGUISTICS . 

Identifiers: Linguistic String Parser syntactic analysis for 
English scientific texts: 
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syntactic ■••"■F- - - i^.u.i.nf (2} aagr»o»*'®^ of th« 

ndWldoil woras intp phr.^M ^^"'igi^^gat ion of phr.S.. Into 

wjtr, compon.nt pht^M'^^a^ display a ma^n or 

rotM aacn p'ays and •'f^"'^, < < *iratar!« focuaas on 

1nd:p8na.nt clausa. ^l^^l^'J^,^ i^rSlf f .rant V/p.s of 
tha ril-tiva "■•^'"^'•'^d tna n.a«uri of structural or 
varb pHraiaS ara asad. and xna ^ 

stylistic compiaxity. < •J^y/^.^s- , / componant i i 1 «nalysis/ 
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